DVC‐Net: A deep neural network model for dense video captioning
Abstract Dense video captioning (DVC) detects multiple events in an input video and generates natural language sentences to describe each event. Previous studies predominantly used convolutional neural networks to extract visual features from videos but failed to employ high‐level semantics to effec...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2021-02-01
|
Series: | IET Computer Vision |
Online Access: | https://doi.org/10.1049/cvi2.12013 |