Model-based identification of Oriental documents

Computers with the capability of identifying languages printed in documents can support many potential applications including document classification for character recognition, translation, and language understanding. Language identification is normally done manually. However, the high volume and va...

Full description

Bibliographic Details
Main Author:	Yacoub Said, Rita A
Format:	Others
Published:	1999
Online Access:	http://spectrum.library.concordia.ca/877/1/MQ43669.pdf Yacoub Said, Rita A <http://spectrum.library.concordia.ca/view/creators/Yacoub_Said=3ARita_A=3A=3A.html> (1999) Model-based identification of Oriental documents. Masters thesis, Concordia University.

id	ndltd-LACETR-oai-collectionscanada.gc.ca-QMG.877
record_format	oai_dc
spelling	ndltd-LACETR-oai-collectionscanada.gc.ca-QMG.8772013-10-22T03:41:03Z Model-based identification of Oriental documents Yacoub Said, Rita A Computers with the capability of identifying languages printed in documents can support many potential applications including document classification for character recognition, translation, and language understanding. Language identification is normally done manually. However, the high volume and variety of languages encountered make manual identification impractical and an automatic language approach becomes necessary. Therefore, language identification is a key step in the automatic processing of document images. This thesis is concerned with a model-based classification of Oriental documents into Chinese, Japanese, and Korean. A model-based approach locates an object, of which the computer has a model in an image. In this work, the objects to be located are some of the most frequently appearing characters in each of the three Oriental languages, and the images to be searched for the objects are the Oriental documents fed to the system. A major part of the work is to locate instances of the character models in an Oriental document. which is done by using the Hausdorff distance, a similarity measure defined between two sets of points. One of the point sets represents a model of some Oriental character to look for, and the other represents each character in the document image to be identified. Since Oriental documents are complex in structure, a portion of the text is extracted from the input document for further processing 1999 Thesis NonPeerReviewed application/pdf http://spectrum.library.concordia.ca/877/1/MQ43669.pdf Yacoub Said, Rita A <http://spectrum.library.concordia.ca/view/creators/Yacoub_Said=3ARita_A=3A=3A.html> (1999) Model-based identification of Oriental documents. Masters thesis, Concordia University. http://spectrum.library.concordia.ca/877/
collection	NDLTD
format	Others
sources	NDLTD
description	Computers with the capability of identifying languages printed in documents can support many potential applications including document classification for character recognition, translation, and language understanding. Language identification is normally done manually. However, the high volume and variety of languages encountered make manual identification impractical and an automatic language approach becomes necessary. Therefore, language identification is a key step in the automatic processing of document images. This thesis is concerned with a model-based classification of Oriental documents into Chinese, Japanese, and Korean. A model-based approach locates an object, of which the computer has a model in an image. In this work, the objects to be located are some of the most frequently appearing characters in each of the three Oriental languages, and the images to be searched for the objects are the Oriental documents fed to the system. A major part of the work is to locate instances of the character models in an Oriental document. which is done by using the Hausdorff distance, a similarity measure defined between two sets of points. One of the point sets represents a model of some Oriental character to look for, and the other represents each character in the document image to be identified. Since Oriental documents are complex in structure, a portion of the text is extracted from the input document for further processing
author	Yacoub Said, Rita A
spellingShingle	Yacoub Said, Rita A Model-based identification of Oriental documents
author_facet	Yacoub Said, Rita A
author_sort	Yacoub Said, Rita A
title	Model-based identification of Oriental documents
title_short	Model-based identification of Oriental documents
title_full	Model-based identification of Oriental documents
title_fullStr	Model-based identification of Oriental documents
title_full_unstemmed	Model-based identification of Oriental documents
title_sort	model-based identification of oriental documents
publishDate	1999
url	http://spectrum.library.concordia.ca/877/1/MQ43669.pdf Yacoub Said, Rita A <http://spectrum.library.concordia.ca/view/creators/Yacoub_Said=3ARita_A=3A=3A.html> (1999) Model-based identification of Oriental documents. Masters thesis, Concordia University.
work_keys_str_mv	AT yacoubsaidritaa modelbasedidentificationoforientaldocuments
_version_	1716605505824096256

Model-based identification of Oriental documents

Similar Items