Summary: | Optical character recognition requires that data acquired from a camera device first be reconditioned into a suitable form. This thesis presents a design study of a preprocessor intended to accomplish this task under the constraints of real-time, autonomous operation as part of a reading machine for the blind. The preprocessor contains three components: a filter to reduce the influence of noise and enhance the text seen; a binarization stage to identify the character and background pixels by a single bit; and a segmention system which isolates individual characters for delivery to the recognizer in proper causal order. Filtering of the acquired video information is performed with the Laplacian of a Gaussian, V²g, edge detection operator developed by D. Marr. This filter is shown to locate the character edges and to control the amount of detail seen optimally. Developed from models of the human visual system, this filter promises that the preprocessor could attain a human text resolution capability. Binarization is also reduced to a simple thresholding of the filter's output. To achieve optimal enhancement of print structures of known dimensions a filter design strategy is presented incorporating two periodic edge models. Since the filter must be digitized, the design method is further extended to include an analysis of the effects of sampling the continuous filter and quantizing its coefficients for a direct-form finite impulse response implementation. To validate the claim that this filter's performance is superior to other edge detection methods, a chapter is devoted to quantitative evaluation of V²g filtered test images. The results are compared to others published and found superior, as well as remarkably noise immune. Segmentation of the binarized text is accomplished with a technique adapted from the binary-image description method of C.T. Zahn. The text images are reduced to a hybrid chain-code-and-coordinate description of all internal and external boundaries concurrent with the incoming raster-acquired data. No more than two image scan lines need be stored. All closed borders within the text are detected immediately as they occur by monitoring the local changes in the image's Euler number during a scan, and verifying global closure by following pointers through a linked-list data structure. A simulation of an implementation of the segmentation system, operating on fifty V²g filtered test images, indicated that the real-time performance objective can be achieved through a combined serial and parallel architecture. === Applied Science, Faculty of === Electrical and Computer Engineering, Department of === Graduate
|