Transforming Visual Attention into Video Summarization
碩士 === 國立臺灣大學 === 電信工程學研究所 === 107 === Video summarization is among challenging tasks in computer vision, which aims at identifying highlight frames or shots over lengthy video inputs. In this paper, we propose an attention-based model for video summarization and to handle complex video data. A nove...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2019
|
Online Access: | http://ndltd.ncl.edu.tw/handle/qvk4m5 |
id |
ndltd-TW-107NTU05435025 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-107NTU054350252019-11-16T05:27:55Z http://ndltd.ncl.edu.tw/handle/qvk4m5 Transforming Visual Attention into Video Summarization 藉由視覺注意力來處理視頻摘要 Yen-Ting Liu 劉彥廷 碩士 國立臺灣大學 電信工程學研究所 107 Video summarization is among challenging tasks in computer vision, which aims at identifying highlight frames or shots over lengthy video inputs. In this paper, we propose an attention-based model for video summarization and to handle complex video data. A novel deep learning the framework of multi-head multi-layer video self-attention (M2VSA) is presented to identify informative regions across spatial and temporal video features, which jointly exploit context diversity over space and time for summarization purposes. Together with visual concept consistency enforced in our framework, both video recovery and summarization can be preserved. More importantly, our developed model can be realized in both supervised/unsupervised settings. Finally, our experiments quantitative and qualitative results demonstrate the effectiveness of our model and our superiority over state-of-the-art approaches. Yu-Chiang Wang 王鈺強 2019 學位論文 ; thesis 32 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣大學 === 電信工程學研究所 === 107 === Video summarization is among challenging tasks in computer vision, which aims at identifying highlight frames or shots over lengthy video inputs. In this paper, we propose an attention-based model for video summarization and to handle complex video data. A novel deep learning the framework of multi-head multi-layer video self-attention (M2VSA) is presented to identify informative regions across spatial and temporal video features, which jointly exploit context diversity over space and time for summarization purposes. Together with visual concept consistency enforced in our framework, both video recovery and summarization can be preserved. More importantly, our developed model can be realized in both supervised/unsupervised settings. Finally, our experiments quantitative and qualitative results demonstrate the effectiveness of our model and our superiority over state-of-the-art approaches.
|
author2 |
Yu-Chiang Wang |
author_facet |
Yu-Chiang Wang Yen-Ting Liu 劉彥廷 |
author |
Yen-Ting Liu 劉彥廷 |
spellingShingle |
Yen-Ting Liu 劉彥廷 Transforming Visual Attention into Video Summarization |
author_sort |
Yen-Ting Liu |
title |
Transforming Visual Attention into Video Summarization |
title_short |
Transforming Visual Attention into Video Summarization |
title_full |
Transforming Visual Attention into Video Summarization |
title_fullStr |
Transforming Visual Attention into Video Summarization |
title_full_unstemmed |
Transforming Visual Attention into Video Summarization |
title_sort |
transforming visual attention into video summarization |
publishDate |
2019 |
url |
http://ndltd.ncl.edu.tw/handle/qvk4m5 |
work_keys_str_mv |
AT yentingliu transformingvisualattentionintovideosummarization AT liúyàntíng transformingvisualattentionintovideosummarization AT yentingliu jíyóushìjuézhùyìlìláichùlǐshìpínzhāiyào AT liúyàntíng jíyóushìjuézhùyìlìláichùlǐshìpínzhāiyào |
_version_ |
1719292369327620096 |