Multimodal Feature Learning for Video Captioning

Video captioning refers to the task of generating a natural language sentence that explains the content of the input video clips. This study proposes a deep neural network model for effective video captioning. Apart from visual features, the proposed model learns additionally semantic features that...

Full description

Bibliographic Details
Main Authors: Sujin Lee, Incheol Kim
Format: Article
Language:English
Published: Hindawi Limited 2018-01-01
Series:Mathematical Problems in Engineering
Online Access:http://dx.doi.org/10.1155/2018/3125879