Transforming Visual Attention into Video Summarization

碩士 === 國立臺灣大學 === 電信工程學研究所 === 107 === Video summarization is among challenging tasks in computer vision, which aims at identifying highlight frames or shots over lengthy video inputs. In this paper, we propose an attention-based model for video summarization and to handle complex video data. A nove...

Full description

Bibliographic Details
Main Authors:	Yen-Ting Liu, 劉彥廷
Other Authors:	Yu-Chiang Wang
Format:	Others
Language:	en_US
Published:	2019
Online Access:	http://ndltd.ncl.edu.tw/handle/qvk4m5

id	ndltd-TW-107NTU05435025
record_format	oai_dc
spelling	ndltd-TW-107NTU054350252019-11-16T05:27:55Z http://ndltd.ncl.edu.tw/handle/qvk4m5 Transforming Visual Attention into Video Summarization 藉由視覺注意力來處理視頻摘要 Yen-Ting Liu 劉彥廷碩士國立臺灣大學電信工程學研究所 107 Video summarization is among challenging tasks in computer vision, which aims at identifying highlight frames or shots over lengthy video inputs. In this paper, we propose an attention-based model for video summarization and to handle complex video data. A novel deep learning the framework of multi-head multi-layer video self-attention (M2VSA) is presented to identify informative regions across spatial and temporal video features, which jointly exploit context diversity over space and time for summarization purposes. Together with visual concept consistency enforced in our framework, both video recovery and summarization can be preserved. More importantly, our developed model can be realized in both supervised/unsupervised settings. Finally, our experiments quantitative and qualitative results demonstrate the effectiveness of our model and our superiority over state-of-the-art approaches. Yu-Chiang Wang 王鈺強 2019 學位論文 ; thesis 32 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 國立臺灣大學 === 電信工程學研究所 === 107 === Video summarization is among challenging tasks in computer vision, which aims at identifying highlight frames or shots over lengthy video inputs. In this paper, we propose an attention-based model for video summarization and to handle complex video data. A novel deep learning the framework of multi-head multi-layer video self-attention (M2VSA) is presented to identify informative regions across spatial and temporal video features, which jointly exploit context diversity over space and time for summarization purposes. Together with visual concept consistency enforced in our framework, both video recovery and summarization can be preserved. More importantly, our developed model can be realized in both supervised/unsupervised settings. Finally, our experiments quantitative and qualitative results demonstrate the effectiveness of our model and our superiority over state-of-the-art approaches.
author2	Yu-Chiang Wang
author_facet	Yu-Chiang Wang Yen-Ting Liu 劉彥廷
author	Yen-Ting Liu 劉彥廷
spellingShingle	Yen-Ting Liu 劉彥廷 Transforming Visual Attention into Video Summarization
author_sort	Yen-Ting Liu
title	Transforming Visual Attention into Video Summarization
title_short	Transforming Visual Attention into Video Summarization
title_full	Transforming Visual Attention into Video Summarization
title_fullStr	Transforming Visual Attention into Video Summarization
title_full_unstemmed	Transforming Visual Attention into Video Summarization
title_sort	transforming visual attention into video summarization
publishDate	2019
url	http://ndltd.ncl.edu.tw/handle/qvk4m5
work_keys_str_mv	AT yentingliu transformingvisualattentionintovideosummarization AT liúyàntíng transformingvisualattentionintovideosummarization AT yentingliu jíyóushìjuézhùyìlìláichùlǐshìpínzhāiyào AT liúyàntíng jíyóushìjuézhùyìlìláichùlǐshìpínzhāiyào
_version_	1719292369327620096

Transforming Visual Attention into Video Summarization

Similar Items