DVC‐Net: A deep neural network model for dense video captioning
Abstract Dense video captioning (DVC) detects multiple events in an input video and generates natural language sentences to describe each event. Previous studies predominantly used convolutional neural networks to extract visual features from videos but failed to employ high‐level semantics to effec...
Main Authors: | Sujin Lee, Incheol Kim |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2021-02-01
|
Series: | IET Computer Vision |
Online Access: | https://doi.org/10.1049/cvi2.12013 |
Similar Items
-
Multimodal Feature Learning for Video Captioning
by: Sujin Lee, et al.
Published: (2018-01-01) -
Video Caption Based Searching Using End-to-End Dense Captioning and Sentence Embeddings
by: Akshay Aggarwal, et al.
Published: (2020-06-01) -
Image Captioning Based on Deep Neural Networks
by: Liu Shuang, et al.
Published: (2018-01-01) -
Multilayer Dense Attention Model for Image Caption
by: Ke Wang, et al.
Published: (2019-01-01) -
Fully Convolutional CaptionNet: Siamese Difference Captioning Attention Model
by: Ariyo Oluwasanmi, et al.
Published: (2019-01-01)