Summary: | 碩士 === 明道管理學院 === 管理研究所 === 93 === Due to the development of Internet is fast, the development of the digital library is more and more important. According to Dr. Daniel Greenstein proposed that at current stage, digital libraries trend to develop the digital collection, and integrate all kinds of digital data to provide them to user. However, how to digitize all data in the traditional library is important. The technology of Document Image Analysis (DIA) can achieve this work. In these technologies, document image segmentation is an important step. Its goal is to separate background, texts and pictures from a document image and recognition them.
In this thesis, we propose the document image segmentation system based on many kinds of document image features. We present a reliable system for edge detection, localization, extraction, and binarization text from document image. The system can extract background information, words and pictures from the different color document image. Due to document image synthesize many image features such as background, text and picture etc., we employ several image feature extraction methods to extract them. They involve statistical characteristic measures, edge detection, projection, gaussian mixture model and so on.
Experimental results have demonstrated the effectiveness and superiority of the propose method after an extensive set of document images is tested. It shows a good performance to this system.
Keywords: Digital library, Document image analysis, Document Image segmentation, Color background, Feature extraction, Statistical measure, Edge detection, Projection, Gaussian mixture model.
|