A Green Form-Based Information Extraction System for Historical Documents
Many historical documents are rich in genealogical facts. Extracting these facts by hand is tedious and almost impossible considering the hundreds of thousands of genealogically rich family-history books currently scanned and online. As one approach for helping to make the extraction feasible, we...
Main Author: | |
---|---|
Format: | Others |
Published: |
BYU ScholarsArchive
2017
|
Subjects: | |
Online Access: | https://scholarsarchive.byu.edu/etd/6375 https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=7375&context=etd |
id |
ndltd-BGMYU2-oai-scholarsarchive.byu.edu-etd-7375 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-BGMYU2-oai-scholarsarchive.byu.edu-etd-73752019-05-16T03:31:32Z A Green Form-Based Information Extraction System for Historical Documents Kim, Tae Woo Many historical documents are rich in genealogical facts. Extracting these facts by hand is tedious and almost impossible considering the hundreds of thousands of genealogically rich family-history books currently scanned and online. As one approach for helping to make the extraction feasible, we propose GreenFIE—a "Green" Form-based Information-Extraction tool which is "green" in the sense that it improves with use toward the goal of minimizing the cost of human labor while maintaining high extraction accuracy. Given a page in a historical document, the user's task is to fill out given forms with all facts on a page in a document called for by the forms (e.g. to collect the birth and death information, marriage information, and parent-child relationships for each person on the page). GreenFIE has a repository of extraction patterns that it applies to fill in forms. A user checks the correctness of GreenFIE's form filling, adds any missed facts, and fixes any mistakes. GreenFIE learns based on user feedback, adding new extraction rules to its repository. Ideally, GreenFIE improves as it proceeds so that it does most of the work, leaving little for the user to do other than confirm that its extraction is correct. We evaluate how well GreenFIE performs on family history books in terms of "greenness"—how much human labor diminishes during form filling, while simultaneously maintaining high accuracy. 2017-05-01T07:00:00Z text application/pdf https://scholarsarchive.byu.edu/etd/6375 https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=7375&context=etd http://lib.byu.edu/about/copyright/ All Theses and Dissertations BYU ScholarsArchive green systems self-improving systems data extraction regular-expression generation Computer Sciences |
collection |
NDLTD |
format |
Others
|
sources |
NDLTD |
topic |
green systems self-improving systems data extraction regular-expression generation Computer Sciences |
spellingShingle |
green systems self-improving systems data extraction regular-expression generation Computer Sciences Kim, Tae Woo A Green Form-Based Information Extraction System for Historical Documents |
description |
Many historical documents are rich in genealogical facts. Extracting these facts by hand is tedious and almost impossible considering the hundreds of thousands of genealogically rich family-history books currently scanned and online. As one approach for helping to make the extraction feasible, we propose GreenFIE—a "Green" Form-based Information-Extraction tool which is "green" in the sense that it improves with use toward the goal of minimizing the cost of human labor while maintaining high extraction accuracy. Given a page in a historical document, the user's task is to fill out given forms with all facts on a page in a document called for by the forms (e.g. to collect the birth and death information, marriage information, and parent-child relationships for each person on the page). GreenFIE has a repository of extraction patterns that it applies to fill in forms. A user checks the correctness of GreenFIE's form filling, adds any missed facts, and fixes any mistakes. GreenFIE learns based on user feedback, adding new extraction rules to its repository. Ideally, GreenFIE improves as it proceeds so that it does most of the work, leaving little for the user to do other than confirm that its extraction is correct. We evaluate how well GreenFIE performs on family history books in terms of "greenness"—how much human labor diminishes during form filling, while simultaneously maintaining high accuracy. |
author |
Kim, Tae Woo |
author_facet |
Kim, Tae Woo |
author_sort |
Kim, Tae Woo |
title |
A Green Form-Based Information Extraction System for Historical Documents |
title_short |
A Green Form-Based Information Extraction System for Historical Documents |
title_full |
A Green Form-Based Information Extraction System for Historical Documents |
title_fullStr |
A Green Form-Based Information Extraction System for Historical Documents |
title_full_unstemmed |
A Green Form-Based Information Extraction System for Historical Documents |
title_sort |
green form-based information extraction system for historical documents |
publisher |
BYU ScholarsArchive |
publishDate |
2017 |
url |
https://scholarsarchive.byu.edu/etd/6375 https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=7375&context=etd |
work_keys_str_mv |
AT kimtaewoo agreenformbasedinformationextractionsystemforhistoricaldocuments AT kimtaewoo greenformbasedinformationextractionsystemforhistoricaldocuments |
_version_ |
1719186841696993280 |