Food Image Captioning with Verb-Noun Pairs Empowered by Joint Correlation

碩士 === 國立中正大學 === 資訊工程研究所 === 103 === Studies of image captioning explosively emerge in recent two years. Though many elegant approaches have been proposed for general purposed image captioning, considering domain knowledge or specific description structure in a targeted domain still remains undisco...

Full description

Bibliographic Details
Main Authors: LIN,JIA-HSING, 林家興
Other Authors: CHU,WEI-TA
Format: Others
Language:en_US
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/21674221727413201079
Description
Summary:碩士 === 國立中正大學 === 資訊工程研究所 === 103 === Studies of image captioning explosively emerge in recent two years. Though many elegant approaches have been proposed for general purposed image captioning, considering domain knowledge or specific description structure in a targeted domain still remains undiscovered. In this thesis, we concentrate on food image captioning where a food image is better described by not only what food it is but also how it was cooked. We propose neural networks to jointly consider multiple factors, i.e., food recognition, ingredient recognition, and cooking method recognition, and verify that recognition performance can be improved by taking multiple factors into account. With these three factors, food image captions composed of verb-noun pairs (usually cooking method followed by ingredients) can be generated. We demonstrate effectiveness of the proposed methods from various viewpoints, and believe this would be a better way to describe food images in contrast to general-purposed image captioning.