Working with batches of PDF files

Learn how to perform OCR and text extraction with free command line tools like Tesseract and Poppler and how to get an overview of large numbers of PDF documents using topic modeling.

Bibliographic Details
Main Author: Moritz Mähr
Format: Article
Language:English
Published: Editorial Board of the Programming Historian 2020-01-01
Series:The Programming Historian
Online Access:https://programminghistorian.org/en/lessons/working-with-batches-of-pdf-files