Historical Document Reconstruction

碩士 === 淡江大學 === 資訊工程學系 === 89 === Abstract:Long dated historical documents played an important role in the human civilization. Historical documents include dated books, newspaper, magazines, handwriting texts, as well as those that are kept in the form of copies, photos or microfilms. As...

Full description

Bibliographic Details
Main Authors: Ming-Ching Shih, 石明金
Other Authors: Shwu-Huey Yen
Format: Others
Language:zh-TW
Published: 2001
Online Access:http://ndltd.ncl.edu.tw/handle/83822457672266952011
id ndltd-TW-089TKU00392036
record_format oai_dc
spelling ndltd-TW-089TKU003920362015-10-13T12:14:41Z http://ndltd.ncl.edu.tw/handle/83822457672266952011 Historical Document Reconstruction 歷史文件重現 Ming-Ching Shih 石明金 碩士 淡江大學 資訊工程學系 89 Abstract:Long dated historical documents played an important role in the human civilization. Historical documents include dated books, newspaper, magazines, handwriting texts, as well as those that are kept in the form of copies, photos or microfilms. As important assets of our memory, they deserved to be preserved in the best condition as much as we can. However they often are in degraded or even unreadable conditions due to poor paper quality and/or illuminated gradient, corrugation and/or dark background causing commixing of text and background. Therefore it is a much meaningful task for us to reconstruct the document in a clear and easily readable condition.Our idea is based on that most of the pixels in a scanned document are background. They have the properties that to be connected (with some exceptions like the inner holes in letters D, P, or R, etc.), homogeneous, and higher gray levels than the neighboring text pixels. If we can identify the background pixels, then text is extracted consequently. Therefore in our algorithm, partial histogram equalization is first applied to the image to enhance the contrast. Then by agent growing technique to produce two images, one is clean with most of noises being eliminated but some text information lost too and the other is preserving most of the information including noises. At last we use conditional dilation to eliminate the isolated noises and obtain the final result.The proposed algorithm has been proved to be successful to clarify noisy background of severely degraded documents. It is robust to the fonts, sizes, languages types and any degradation errors in the texts. In comparing with three other well known binarization algorithms to several degraded documents. Our algorithm also showed a better result. Shwu-Huey Yen 顏淑惠 2001 學位論文 ; thesis 73 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 淡江大學 === 資訊工程學系 === 89 === Abstract:Long dated historical documents played an important role in the human civilization. Historical documents include dated books, newspaper, magazines, handwriting texts, as well as those that are kept in the form of copies, photos or microfilms. As important assets of our memory, they deserved to be preserved in the best condition as much as we can. However they often are in degraded or even unreadable conditions due to poor paper quality and/or illuminated gradient, corrugation and/or dark background causing commixing of text and background. Therefore it is a much meaningful task for us to reconstruct the document in a clear and easily readable condition.Our idea is based on that most of the pixels in a scanned document are background. They have the properties that to be connected (with some exceptions like the inner holes in letters D, P, or R, etc.), homogeneous, and higher gray levels than the neighboring text pixels. If we can identify the background pixels, then text is extracted consequently. Therefore in our algorithm, partial histogram equalization is first applied to the image to enhance the contrast. Then by agent growing technique to produce two images, one is clean with most of noises being eliminated but some text information lost too and the other is preserving most of the information including noises. At last we use conditional dilation to eliminate the isolated noises and obtain the final result.The proposed algorithm has been proved to be successful to clarify noisy background of severely degraded documents. It is robust to the fonts, sizes, languages types and any degradation errors in the texts. In comparing with three other well known binarization algorithms to several degraded documents. Our algorithm also showed a better result.
author2 Shwu-Huey Yen
author_facet Shwu-Huey Yen
Ming-Ching Shih
石明金
author Ming-Ching Shih
石明金
spellingShingle Ming-Ching Shih
石明金
Historical Document Reconstruction
author_sort Ming-Ching Shih
title Historical Document Reconstruction
title_short Historical Document Reconstruction
title_full Historical Document Reconstruction
title_fullStr Historical Document Reconstruction
title_full_unstemmed Historical Document Reconstruction
title_sort historical document reconstruction
publishDate 2001
url http://ndltd.ncl.edu.tw/handle/83822457672266952011
work_keys_str_mv AT mingchingshih historicaldocumentreconstruction
AT shímíngjīn historicaldocumentreconstruction
AT mingchingshih lìshǐwénjiànzhòngxiàn
AT shímíngjīn lìshǐwénjiànzhòngxiàn
_version_ 1716854828839206912