Number Recognition of Real-world Images in the Forest Industry : a study of segmentation and recognition of numbers on images of logs with color-stamped numbers
Analytics such as machine learning are of big interest in many types of industries. Optical character recognition is essentially a solved problem, whereas number recognition on real-world images which can be one form of machine learning are a more challenging obstacle. The purpose of this study was...
Main Author: | |
---|---|
Format: | Others |
Language: | English |
Published: |
Mittuniversitetet, Institutionen för informationssystem och –teknologi
2020
|
Subjects: | |
Online Access: | http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-39365 |
Summary: | Analytics such as machine learning are of big interest in many types of industries. Optical character recognition is essentially a solved problem, whereas number recognition on real-world images which can be one form of machine learning are a more challenging obstacle. The purpose of this study was to implement a system that can detect and read numbers on given dataset originating from the forest industry being images of color-stamped logs. This study evaluated accuracy of segmentation and number recognition on images of color-stamped logs when using a pre-trained model of the street view house numbers dataset. The general approach of preprocessing was based on car number plate segmentation because of the similar problem of identifying an object to then locate individual digits. Color segmentation were the biggest asset for the preprocessing because of the distinct red color of digits compared to the rest of the image. The accuracy of number recognition was significantly lower when using the pre-trained model on color-stamped logs being 26% in comparison to street view house numbers with 95% but could still reach over 80% per digit accuracy rate for some image classes when excluding accuracy of segmentation. The highest segmentation accuracy among classes was 93% and the lowest was 32%. From the results it was concluded that unclear digits on images lessened the number recognition accuracy the most. There are much to consider for future work, but the most obvious and impactful change would be to train a more accurate model by basing it on the dataset of color-stamped logs. |
---|