A Fine-Grained Spatial-Temporal Attention Model for Video Captioning
Attention mechanism has been extensively used in video captioning tasks, which enables further development of deeper visual understanding. However, most existing video captioning methods apply the attention mechanism on the frame level, which only model the temporal structure and generated words, bu...
Main Authors: | An-An Liu, Yurui Qiu, Yongkang Wong, Yu-Ting Su, Mohan Kankanhalli |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2018-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8523661/ |
Similar Items
-
Sequential Dual Attention: Coarse-to-Fine-Grained Hierarchical Generation for Image Captioning
by: Zhibin Guan, et al.
Published: (2018-11-01) -
Video captioning with stacked attention and semantic hard pull
by: Md. Mushfiqur Rahman, et al.
Published: (2021-08-01) -
Video Caption Based Searching Using End-to-End Dense Captioning and Sentence Embeddings
by: Akshay Aggarwal, et al.
Published: (2020-06-01) -
Variational Autoencoder-Based Multiple Image Captioning Using a Caption Attention Map
by: Boeun Kim, et al.
Published: (2019-07-01) -
Video Captioning Based on Channel Soft Attention and Semantic Reconstructor
by: Zhou Lei, et al.
Published: (2021-02-01)