Using Gazetteers to Extract Sets of Keywords from Free-Flowing Texts

If you have a copy of a text in electronic format stored on your computer, it is relatively easy to keyword search for a single term. Often you can do this by using the built-in search features in your favourite text editor. However, scholars are increasingly needing to find instances of many terms...

Full description

Bibliographic Details
Main Author: Adam Crymble
Format: Article
Language:English
Published: Editorial Board of the Programming Historian 2015-12-01
Series:The Programming Historian
Subjects:
Online Access:http://programminghistorian.org/lessons/extracting-keywords
Description
Summary:If you have a copy of a text in electronic format stored on your computer, it is relatively easy to keyword search for a single term. Often you can do this by using the built-in search features in your favourite text editor. However, scholars are increasingly needing to find instances of many terms within a text or texts. For example, a scholar may want to use a gazetteer to extract all mentions of English placenames within a collection of texts so that those places can later be plotted on a map. Alternatively, they may want to extract all male given names, all pronouns, stop words, or any other set of words. Using those same built-in search features to achieve this more complex goal is time consuming and clunky. This lesson will teach you how to use Python to extract a set of keywords very quickly and systematically from a set of texts. It is expected that once you have completed this lesson, you will be able to generalise the skills to extract custom sets of keywords from any set of locally saved files.
ISSN:2397-2068