Using Gazetteers to Extract Sets of Keywords from Free-Flowing Texts

If you have a copy of a text in electronic format stored on your computer, it is relatively easy to keyword search for a single term. Often you can do this by using the built-in search features in your favourite text editor. However, scholars are increasingly needing to find instances of many terms...

Full description

Bibliographic Details
Main Author: Adam Crymble
Format: Article
Language:English
Published: Editorial Board of the Programming Historian 2015-12-01
Series:The Programming Historian
Subjects:
Online Access:http://programminghistorian.org/lessons/extracting-keywords
id doaj-09232f7ba4e74f0eaea10d2482277b65
record_format Article
spelling doaj-09232f7ba4e74f0eaea10d2482277b652020-11-24T23:38:18ZengEditorial Board of the Programming HistorianThe Programming Historian2397-20682015-12-01Using Gazetteers to Extract Sets of Keywords from Free-Flowing TextsAdam Crymble0University of HertfordshireIf you have a copy of a text in electronic format stored on your computer, it is relatively easy to keyword search for a single term. Often you can do this by using the built-in search features in your favourite text editor. However, scholars are increasingly needing to find instances of many terms within a text or texts. For example, a scholar may want to use a gazetteer to extract all mentions of English placenames within a collection of texts so that those places can later be plotted on a map. Alternatively, they may want to extract all male given names, all pronouns, stop words, or any other set of words. Using those same built-in search features to achieve this more complex goal is time consuming and clunky. This lesson will teach you how to use Python to extract a set of keywords very quickly and systematically from a set of texts. It is expected that once you have completed this lesson, you will be able to generalise the skills to extract custom sets of keywords from any set of locally saved files.http://programminghistorian.org/lessons/extracting-keywordsgazetteerpythondata manipulationdata mining
collection DOAJ
language English
format Article
sources DOAJ
author Adam Crymble
spellingShingle Adam Crymble
Using Gazetteers to Extract Sets of Keywords from Free-Flowing Texts
The Programming Historian
gazetteer
python
data manipulation
data mining
author_facet Adam Crymble
author_sort Adam Crymble
title Using Gazetteers to Extract Sets of Keywords from Free-Flowing Texts
title_short Using Gazetteers to Extract Sets of Keywords from Free-Flowing Texts
title_full Using Gazetteers to Extract Sets of Keywords from Free-Flowing Texts
title_fullStr Using Gazetteers to Extract Sets of Keywords from Free-Flowing Texts
title_full_unstemmed Using Gazetteers to Extract Sets of Keywords from Free-Flowing Texts
title_sort using gazetteers to extract sets of keywords from free-flowing texts
publisher Editorial Board of the Programming Historian
series The Programming Historian
issn 2397-2068
publishDate 2015-12-01
description If you have a copy of a text in electronic format stored on your computer, it is relatively easy to keyword search for a single term. Often you can do this by using the built-in search features in your favourite text editor. However, scholars are increasingly needing to find instances of many terms within a text or texts. For example, a scholar may want to use a gazetteer to extract all mentions of English placenames within a collection of texts so that those places can later be plotted on a map. Alternatively, they may want to extract all male given names, all pronouns, stop words, or any other set of words. Using those same built-in search features to achieve this more complex goal is time consuming and clunky. This lesson will teach you how to use Python to extract a set of keywords very quickly and systematically from a set of texts. It is expected that once you have completed this lesson, you will be able to generalise the skills to extract custom sets of keywords from any set of locally saved files.
topic gazetteer
python
data manipulation
data mining
url http://programminghistorian.org/lessons/extracting-keywords
work_keys_str_mv AT adamcrymble usinggazetteerstoextractsetsofkeywordsfromfreeflowingtexts
_version_ 1725517097948151808