Exploiting Multimedia Content: A Machine Learning Based Aproach

This thesis explores use of machine learning for multimedia content management involving single/multiple features, modalities and concepts. We introduce shape based feature for binary patterns and apply it for recognition and retrieval application in single and multiple feature based architecture. T...

Full description

Bibliographic Details
Main Author:	Ehtesham Hassan
Format:	Article
Language:	English
Published:	Computer Vision Center Press 2014-06-01
Series:	ELCVIA Electronic Letters on Computer Vision and Image Analysis
Online Access:	https://elcvia.cvc.uab.es/article/view/598

id	doaj-b3b01c8e54c9442fb70144118c119cf9
record_format	Article
spelling	doaj-b3b01c8e54c9442fb70144118c119cf92021-09-18T12:39:19ZengComputer Vision Center PressELCVIA Electronic Letters on Computer Vision and Image Analysis1577-50972014-06-0113210.5565/rev/elcvia.598242Exploiting Multimedia Content: A Machine Learning Based AproachEhtesham Hassan0Innovation Labs, TCSThis thesis explores use of machine learning for multimedia content management involving single/multiple features, modalities and concepts. We introduce shape based feature for binary patterns and apply it for recognition and retrieval application in single and multiple feature based architecture. The multiple feature based recognition and retrieval frameworks are based on the theory of multiple kernel learning (MKL). A binary pattern recognition framework is presented by combining the binary MKL classifiers using a decision directed acyclic graph. The evaluation is shown for Indian script character recognition, and MPEG7 shape symbol recognition. A word image based document indexing framework is presented using the distance based hashing (DBH) defined on learned pivot centres. We use a new multi-kernel learning scheme using a Genetic Algorithm for developing a kernel DBH based document image retrieval system. The experimental evaluation is presented on document collections of Devanagari, Bengali and English scripts. Next, methods for document retrieval using multi-modal information fusion are presented. Text/Graphics segmentation framework is presented for documents having a complex layout. We present a novel multi-modal document retrieval framework using the segmented regions. The approach is evaluated on English magazine pages. A document script identification framework is presented using decision level aggregation of page, paragraph and word level prediction. Latent Dirichlet Allocation based topic modelling with modified edit distance is introduced for the retrieval of documents having recognition inaccuracies. A multi-modal indexing framework for such documents is presented by a learning based combination of text and image based properties. Experimental results are shown on Devanagari script documents. Finally, we have investigated concept based approaches for multimedia analysis. A multi-modal document retrieval framework is presented by combining the generative and discriminative modelling for exploiting the cross-modal correlation between modalities. The combination is also explored for semantic concept recognition using multi-modal components of the same document, and different documents over a collection. An experimental evaluation of the framework is shown for semantic event detection in sport videos, and semantic labelling of components of multi-modal document images.https://elcvia.cvc.uab.es/article/view/598
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Ehtesham Hassan
spellingShingle	Ehtesham Hassan Exploiting Multimedia Content: A Machine Learning Based Aproach ELCVIA Electronic Letters on Computer Vision and Image Analysis
author_facet	Ehtesham Hassan
author_sort	Ehtesham Hassan
title	Exploiting Multimedia Content: A Machine Learning Based Aproach
title_short	Exploiting Multimedia Content: A Machine Learning Based Aproach
title_full	Exploiting Multimedia Content: A Machine Learning Based Aproach
title_fullStr	Exploiting Multimedia Content: A Machine Learning Based Aproach
title_full_unstemmed	Exploiting Multimedia Content: A Machine Learning Based Aproach
title_sort	exploiting multimedia content: a machine learning based aproach
publisher	Computer Vision Center Press
series	ELCVIA Electronic Letters on Computer Vision and Image Analysis
issn	1577-5097
publishDate	2014-06-01
description	This thesis explores use of machine learning for multimedia content management involving single/multiple features, modalities and concepts. We introduce shape based feature for binary patterns and apply it for recognition and retrieval application in single and multiple feature based architecture. The multiple feature based recognition and retrieval frameworks are based on the theory of multiple kernel learning (MKL). A binary pattern recognition framework is presented by combining the binary MKL classifiers using a decision directed acyclic graph. The evaluation is shown for Indian script character recognition, and MPEG7 shape symbol recognition. A word image based document indexing framework is presented using the distance based hashing (DBH) defined on learned pivot centres. We use a new multi-kernel learning scheme using a Genetic Algorithm for developing a kernel DBH based document image retrieval system. The experimental evaluation is presented on document collections of Devanagari, Bengali and English scripts. Next, methods for document retrieval using multi-modal information fusion are presented. Text/Graphics segmentation framework is presented for documents having a complex layout. We present a novel multi-modal document retrieval framework using the segmented regions. The approach is evaluated on English magazine pages. A document script identification framework is presented using decision level aggregation of page, paragraph and word level prediction. Latent Dirichlet Allocation based topic modelling with modified edit distance is introduced for the retrieval of documents having recognition inaccuracies. A multi-modal indexing framework for such documents is presented by a learning based combination of text and image based properties. Experimental results are shown on Devanagari script documents. Finally, we have investigated concept based approaches for multimedia analysis. A multi-modal document retrieval framework is presented by combining the generative and discriminative modelling for exploiting the cross-modal correlation between modalities. The combination is also explored for semantic concept recognition using multi-modal components of the same document, and different documents over a collection. An experimental evaluation of the framework is shown for semantic event detection in sport videos, and semantic labelling of components of multi-modal document images.
url	https://elcvia.cvc.uab.es/article/view/598
work_keys_str_mv	AT ehteshamhassan exploitingmultimediacontentamachinelearningbasedaproach
_version_	1717377004719833088

Exploiting Multimedia Content: A Machine Learning Based Aproach

Similar Items