Weighted combination of per-frame recognition results for text recognition in a video stream

The scope of uses of automated document recognition has extended and as a result, recognition techniques that do not require specialized equipment have become more relevant. Among such techniques, document recognition using mobile devices is of interest. However, it is not always possible to ensure...

Full description

Bibliographic Details
Main Authors: O. Petrova, K. Bulatov, V.V. Arlazarov, V.L. Arlazarov
Format: Article
Language:English
Published: Samara National Research University 2021-02-01
Series:Компьютерная оптика
Subjects:
Online Access:http://computeroptics.ru/KO/PDF/KO45-1/450110.pdf
id doaj-b83a7be21b0e4757841898ca56219f31
record_format Article
spelling doaj-b83a7be21b0e4757841898ca56219f312021-02-27T14:42:47ZengSamara National Research UniversityКомпьютерная оптика0134-24522412-61792021-02-01451778910.18287/2412-6179-CO-795Weighted combination of per-frame recognition results for text recognition in a video streamO. Petrova0K. Bulatov1V.V. Arlazarov2V.L. Arlazarov3FRC CSC RAS, Moscow, Russia; Smart Engines Service LLC, Moscow, RussiaFRC CSC RAS, Moscow, Russia; Smart Engines Service LLC, Moscow, Russia; Moscow Institute of Physics and Technology (State University), Moscow, RussiaFRC CSC RAS, Moscow, Russia; Smart Engines Service LLC, Moscow, RussiaFRC CSC RAS, Moscow, Russia; Smart Engines Service LLC, Moscow, Russia; Moscow Institute of Physics and Technology (State University), Moscow, RussiaThe scope of uses of automated document recognition has extended and as a result, recognition techniques that do not require specialized equipment have become more relevant. Among such techniques, document recognition using mobile devices is of interest. However, it is not always possible to ensure controlled capturing conditions and, consequentially, high quality of input images. Unlike specialized scanners, mobile cameras allow using a video stream as an input, thus obtaining several images of the recognized object, captured with various characteristics. In this case, a problem of combining the information from multiple input frames arises. In this paper, we propose a weighing model for the process of combining the per-frame recognition results, two approaches to the weighted combination of the text recognition results, and two weighing criteria. The effectiveness of the proposed approaches is tested using datasets of identity documents captured with a mobile device camera in different conditions, including perspective distortion of the document image and low lighting conditions. The experimental results show that the weighting combination can improve the text recognition result quality in the video stream, and the per-character weighting method with input image focus estimation as a base criterion allows one to achieve the best results on the datasets analyzed.http://computeroptics.ru/KO/PDF/KO45-1/450110.pdfmobile ocrvideo streamanytime algorithmsweighted combinationensemble methods
collection DOAJ
language English
format Article
sources DOAJ
author O. Petrova
K. Bulatov
V.V. Arlazarov
V.L. Arlazarov
spellingShingle O. Petrova
K. Bulatov
V.V. Arlazarov
V.L. Arlazarov
Weighted combination of per-frame recognition results for text recognition in a video stream
Компьютерная оптика
mobile ocr
video stream
anytime algorithms
weighted combination
ensemble methods
author_facet O. Petrova
K. Bulatov
V.V. Arlazarov
V.L. Arlazarov
author_sort O. Petrova
title Weighted combination of per-frame recognition results for text recognition in a video stream
title_short Weighted combination of per-frame recognition results for text recognition in a video stream
title_full Weighted combination of per-frame recognition results for text recognition in a video stream
title_fullStr Weighted combination of per-frame recognition results for text recognition in a video stream
title_full_unstemmed Weighted combination of per-frame recognition results for text recognition in a video stream
title_sort weighted combination of per-frame recognition results for text recognition in a video stream
publisher Samara National Research University
series Компьютерная оптика
issn 0134-2452
2412-6179
publishDate 2021-02-01
description The scope of uses of automated document recognition has extended and as a result, recognition techniques that do not require specialized equipment have become more relevant. Among such techniques, document recognition using mobile devices is of interest. However, it is not always possible to ensure controlled capturing conditions and, consequentially, high quality of input images. Unlike specialized scanners, mobile cameras allow using a video stream as an input, thus obtaining several images of the recognized object, captured with various characteristics. In this case, a problem of combining the information from multiple input frames arises. In this paper, we propose a weighing model for the process of combining the per-frame recognition results, two approaches to the weighted combination of the text recognition results, and two weighing criteria. The effectiveness of the proposed approaches is tested using datasets of identity documents captured with a mobile device camera in different conditions, including perspective distortion of the document image and low lighting conditions. The experimental results show that the weighting combination can improve the text recognition result quality in the video stream, and the per-character weighting method with input image focus estimation as a base criterion allows one to achieve the best results on the datasets analyzed.
topic mobile ocr
video stream
anytime algorithms
weighted combination
ensemble methods
url http://computeroptics.ru/KO/PDF/KO45-1/450110.pdf
work_keys_str_mv AT opetrova weightedcombinationofperframerecognitionresultsfortextrecognitioninavideostream
AT kbulatov weightedcombinationofperframerecognitionresultsfortextrecognitioninavideostream
AT vvarlazarov weightedcombinationofperframerecognitionresultsfortextrecognitioninavideostream
AT vlarlazarov weightedcombinationofperframerecognitionresultsfortextrecognitioninavideostream
_version_ 1724247971112943616