Image classification by combining key term extraction and spoken term detection
碩士 === 國立臺灣大學 === 電信工程學研究所 === 105 === Children usually learn objects or concepts from visual and hearing input without being exactly taught about those objects or concepts. We hope machines can do something similar, i.e., learn something from unlabeled video and audio autometically. In the Internet...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2017
|
Online Access: | http://ndltd.ncl.edu.tw/handle/z3j88q |
id |
ndltd-TW-105NTU05435090 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-105NTU054350902019-05-15T23:39:40Z http://ndltd.ncl.edu.tw/handle/z3j88q Image classification by combining key term extraction and spoken term detection 結合關鍵用語擷取與口述詞彙偵測之影像辨識 Hsien-Chin Lin 林賢進 碩士 國立臺灣大學 電信工程學研究所 105 Children usually learn objects or concepts from visual and hearing input without being exactly taught about those objects or concepts. We hope machines can do something similar, i.e., learn something from unlabeled video and audio autometically. In the Internet era, abundant resources are available on the Internet. For example, the instruction and training videos about cooking, dancing and the environment on YouTube. We wish to be able to use them . Most of such videos on YouTube mentioned above are not labled, thus difficult to be used in training machines. Human annotation for these videos is expansive. This research therefore proposed a direction and develops a system, which performs key term extraction and spoken term detection over the audio, and uses the detected key terms to label the frames of the video automatically. It can also discover the important concepts in the videos, treating them as classes of images. We then use these labeled data to train an image classification model and reasonably good results can be obtained. A novel key term extraction approach based on the location of the terms and the context in the sentences was also proposed here, which was shown to be domain independent. In other words, once trained it can be used to extract key terms in unseen domains. 李琳山 2017 學位論文 ; thesis 72 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣大學 === 電信工程學研究所 === 105 === Children usually learn objects or concepts from visual and hearing input without being exactly taught about those objects or concepts. We hope machines can do something similar, i.e., learn something from unlabeled video and audio autometically. In the Internet era, abundant resources are available on the Internet. For example, the instruction and training videos about cooking, dancing and the environment on YouTube. We wish to be able to use them .
Most of such videos on YouTube mentioned above are not labled, thus difficult to be used in training machines. Human annotation for these videos is expansive. This research therefore proposed a direction and develops a system, which performs key term extraction and spoken term detection over the audio, and uses the detected key terms to label the frames of the video automatically. It can also discover the important concepts in the videos, treating them as classes of images. We then use these labeled data to train an image classification model and reasonably good results can be obtained. A novel key term extraction approach based on the location of the terms and the context in the sentences was also proposed here, which was shown to be domain independent. In other words, once trained it can be used to extract key terms in unseen domains.
|
author2 |
李琳山 |
author_facet |
李琳山 Hsien-Chin Lin 林賢進 |
author |
Hsien-Chin Lin 林賢進 |
spellingShingle |
Hsien-Chin Lin 林賢進 Image classification by combining key term extraction and spoken term detection |
author_sort |
Hsien-Chin Lin |
title |
Image classification by combining key term extraction and spoken term detection |
title_short |
Image classification by combining key term extraction and spoken term detection |
title_full |
Image classification by combining key term extraction and spoken term detection |
title_fullStr |
Image classification by combining key term extraction and spoken term detection |
title_full_unstemmed |
Image classification by combining key term extraction and spoken term detection |
title_sort |
image classification by combining key term extraction and spoken term detection |
publishDate |
2017 |
url |
http://ndltd.ncl.edu.tw/handle/z3j88q |
work_keys_str_mv |
AT hsienchinlin imageclassificationbycombiningkeytermextractionandspokentermdetection AT línxiánjìn imageclassificationbycombiningkeytermextractionandspokentermdetection AT hsienchinlin jiéhéguānjiànyòngyǔxiéqǔyǔkǒushùcíhuìzhēncèzhīyǐngxiàngbiànshí AT línxiánjìn jiéhéguānjiànyòngyǔxiéqǔyǔkǒushùcíhuìzhēncèzhīyǐngxiàngbiànshí |
_version_ |
1719151866979287040 |