Digitization Decisions: Comparing OCR Software for Librarian and Archivist Use
This paper is intended to help librarians and archivists who are involved in digitization work choose optical character recognition (OCR) software. The paper provides an introduction to OCR software for digitization projects, and shares the method we developed for easily evaluating the effectiveness...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Code4Lib
2021-09-01
|
Series: | Code4Lib Journal |
Online Access: | https://journal.code4lib.org/articles/16132 |
id |
doaj-9fa148ca8abb4fb191d95a14866d52e2 |
---|---|
record_format |
Article |
spelling |
doaj-9fa148ca8abb4fb191d95a14866d52e22021-09-22T16:21:46ZengCode4LibCode4Lib Journal1940-57582021-09-015216132Digitization Decisions: Comparing OCR Software for Librarian and Archivist UseLeanne OlsonVeronica BerryThis paper is intended to help librarians and archivists who are involved in digitization work choose optical character recognition (OCR) software. The paper provides an introduction to OCR software for digitization projects, and shares the method we developed for easily evaluating the effectiveness of OCR software on resources we are digitizing. We tested three major OCR programs (Adobe Acrobat, ABBYY FineReader, Tesseract) for accuracy on three different digitized texts from our archives and special collections at the University of Western Ontario. Our test was divided into two parts: a word accuracy test (to determine how searchable the final documents were), and a test with a screen reader (to determine how accessible the final documents were). We share our findings from the tests and make recommendations for OCR work on digitized documents from archives and special collections.https://journal.code4lib.org/articles/16132 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Leanne Olson Veronica Berry |
spellingShingle |
Leanne Olson Veronica Berry Digitization Decisions: Comparing OCR Software for Librarian and Archivist Use Code4Lib Journal |
author_facet |
Leanne Olson Veronica Berry |
author_sort |
Leanne Olson |
title |
Digitization Decisions: Comparing OCR Software for Librarian and Archivist Use |
title_short |
Digitization Decisions: Comparing OCR Software for Librarian and Archivist Use |
title_full |
Digitization Decisions: Comparing OCR Software for Librarian and Archivist Use |
title_fullStr |
Digitization Decisions: Comparing OCR Software for Librarian and Archivist Use |
title_full_unstemmed |
Digitization Decisions: Comparing OCR Software for Librarian and Archivist Use |
title_sort |
digitization decisions: comparing ocr software for librarian and archivist use |
publisher |
Code4Lib |
series |
Code4Lib Journal |
issn |
1940-5758 |
publishDate |
2021-09-01 |
description |
This paper is intended to help librarians and archivists who are involved in digitization work choose optical character recognition (OCR) software. The paper provides an introduction to OCR software for digitization projects, and shares the method we developed for easily evaluating the effectiveness of OCR software on resources we are digitizing.
We tested three major OCR programs (Adobe Acrobat, ABBYY FineReader, Tesseract) for accuracy on three different digitized texts from our archives and special collections at the University of Western Ontario. Our test was divided into two parts: a word accuracy test (to determine how searchable the final documents were), and a test with a screen reader (to determine how accessible the final documents were). We share our findings from the tests and make recommendations for OCR work on digitized documents from archives and special collections. |
url |
https://journal.code4lib.org/articles/16132 |
work_keys_str_mv |
AT leanneolson digitizationdecisionscomparingocrsoftwareforlibrarianandarchivistuse AT veronicaberry digitizationdecisionscomparingocrsoftwareforlibrarianandarchivistuse |
_version_ |
1717371331481174016 |