Digitization Decisions: Comparing OCR Software for Librarian and Archivist Use

This paper is intended to help librarians and archivists who are involved in digitization work choose optical character recognition (OCR) software. The paper provides an introduction to OCR software for digitization projects, and shares the method we developed for easily evaluating the effectiveness...

Full description

Bibliographic Details
Main Authors: Leanne Olson, Veronica Berry
Format: Article
Language:English
Published: Code4Lib 2021-09-01
Series:Code4Lib Journal
Online Access:https://journal.code4lib.org/articles/16132
id doaj-9fa148ca8abb4fb191d95a14866d52e2
record_format Article
spelling doaj-9fa148ca8abb4fb191d95a14866d52e22021-09-22T16:21:46ZengCode4LibCode4Lib Journal1940-57582021-09-015216132Digitization Decisions: Comparing OCR Software for Librarian and Archivist UseLeanne OlsonVeronica BerryThis paper is intended to help librarians and archivists who are involved in digitization work choose optical character recognition (OCR) software. The paper provides an introduction to OCR software for digitization projects, and shares the method we developed for easily evaluating the effectiveness of OCR software on resources we are digitizing. We tested three major OCR programs (Adobe Acrobat, ABBYY FineReader, Tesseract) for accuracy on three different digitized texts from our archives and special collections at the University of Western Ontario. Our test was divided into two parts: a word accuracy test (to determine how searchable the final documents were), and a test with a screen reader (to determine how accessible the final documents were). We share our findings from the tests and make recommendations for OCR work on digitized documents from archives and special collections.https://journal.code4lib.org/articles/16132
collection DOAJ
language English
format Article
sources DOAJ
author Leanne Olson
Veronica Berry
spellingShingle Leanne Olson
Veronica Berry
Digitization Decisions: Comparing OCR Software for Librarian and Archivist Use
Code4Lib Journal
author_facet Leanne Olson
Veronica Berry
author_sort Leanne Olson
title Digitization Decisions: Comparing OCR Software for Librarian and Archivist Use
title_short Digitization Decisions: Comparing OCR Software for Librarian and Archivist Use
title_full Digitization Decisions: Comparing OCR Software for Librarian and Archivist Use
title_fullStr Digitization Decisions: Comparing OCR Software for Librarian and Archivist Use
title_full_unstemmed Digitization Decisions: Comparing OCR Software for Librarian and Archivist Use
title_sort digitization decisions: comparing ocr software for librarian and archivist use
publisher Code4Lib
series Code4Lib Journal
issn 1940-5758
publishDate 2021-09-01
description This paper is intended to help librarians and archivists who are involved in digitization work choose optical character recognition (OCR) software. The paper provides an introduction to OCR software for digitization projects, and shares the method we developed for easily evaluating the effectiveness of OCR software on resources we are digitizing. We tested three major OCR programs (Adobe Acrobat, ABBYY FineReader, Tesseract) for accuracy on three different digitized texts from our archives and special collections at the University of Western Ontario. Our test was divided into two parts: a word accuracy test (to determine how searchable the final documents were), and a test with a screen reader (to determine how accessible the final documents were). We share our findings from the tests and make recommendations for OCR work on digitized documents from archives and special collections.
url https://journal.code4lib.org/articles/16132
work_keys_str_mv AT leanneolson digitizationdecisionscomparingocrsoftwareforlibrarianandarchivistuse
AT veronicaberry digitizationdecisionscomparingocrsoftwareforlibrarianandarchivistuse
_version_ 1717371331481174016