Working with batches of PDF files
Learn how to perform OCR and text extraction with free command line tools like Tesseract and Poppler and how to get an overview of large numbers of PDF documents using topic modeling.
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Editorial Board of the Programming Historian
2020-01-01
|
Series: | The Programming Historian |
Online Access: | https://programminghistorian.org/en/lessons/working-with-batches-of-pdf-files |