Summary: | 碩士 === 淡江大學 === 資訊工程學系 === 89 === Abstract:Long dated historical documents played an important role in the human civilization. Historical documents include dated books, newspaper, magazines, handwriting texts, as well as those that are kept in the form of copies, photos or microfilms. As important assets of our memory, they deserved to be preserved in the best condition as much as we can. However they often are in degraded or even unreadable conditions due to poor paper quality and/or illuminated gradient, corrugation and/or dark background causing commixing of text and background. Therefore it is a much meaningful task for us to reconstruct the document in a clear and easily readable condition.Our idea is based on that most of the pixels in a scanned document are background. They have the properties that to be connected (with some exceptions like the inner holes in letters D, P, or R, etc.), homogeneous, and higher gray levels than the neighboring text pixels. If we can identify the background pixels, then text is extracted consequently. Therefore in our algorithm, partial histogram equalization is first applied to the image to enhance the contrast. Then by agent growing technique to produce two images, one is clean with most of noises being eliminated but some text information lost too and the other is preserving most of the information including noises. At last we use conditional dilation to eliminate the isolated noises and obtain the final result.The proposed algorithm has been proved to be successful to clarify noisy background of severely degraded documents. It is robust to the fonts, sizes, languages types and any degradation errors in the texts. In comparing with three other well known binarization algorithms to several degraded documents. Our algorithm also showed a better result.
|