Summary: | 博士 === 國立交通大學 === 電機與控制工程系所 === 93 === Traditional image compression methods are not suitable for compound document images because such images include much text. These image data are high-frequency components, many of which are lost in compression. Text and the high-frequency components thus become blurred. Then, the text cannot be recognized easily by the human eye or a computer. The text contains most information, separating the text from a compound document image is one of the most significant areas of research into document images. Document image segmentation, which separates the text from the monochromatic background, has been studied for over ten years. Segmenting compound document images is still an open research field. Many techniques have been developed to segment document images. However, they are insufficient when the background includes sharply varying contours or overlaps with text. Finding a text segmentation method of complex compound documents remains a great challenge and the research field is still young. This dissertation presents three segmentation algorithms for compressing image documents, with a high compression ratio of both color and monochromatic compound document images. The proposed algorithms greatly outperform the famous image compression methods, JPEG and DjVu, and enable the effective extraction of the text from a complex background, achieving a high compression ratio for compound document images.
|