Is there a text in my data? (Part 1): on counting words

This essay is the first in a two-part series. This first installment invites readers to consider a few very basic questions: what does it mean to count words in a text? What happens to the text, and to our understanding of it, when we decompose it into a series of word counts? What relation exists b...

Full description

Bibliographic Details
Main Author:	Michael Gavin
Format:	Article
Language:	English
Published:	Department of Languages, Literatures, and Cultures at McGill University
Series:	Journal of Cultural Analytics
Online Access:	http://culturalanalytics.scholasticahq.com/article/11830-is-there-a-text-in-my-data-part-1-on-counting-words.pdf

id	doaj-d90cc8d78d9c45c0902320401c6f96f8
record_format	Article
spelling	doaj-d90cc8d78d9c45c0902320401c6f96f82020-11-25T01:32:04ZengDepartment of Languages, Literatures, and Cultures at McGill UniversityJournal of Cultural Analytics2371-4549Is there a text in my data? (Part 1): on counting wordsMichael GavinThis essay is the first in a two-part series. This first installment invites readers to consider a few very basic questions: what does it mean to count words in a text? What happens to the text, and to our understanding of it, when we decompose it into a series of word counts? What relation exists between the textual domain and its numerical image? Or, to restate this question with a nod to literary critic stanley fish, "is there a text in my data?" following one document through a series of typical transformations -- first into a simple list of words and their frequencies, then to a vector of elements in a matrix, and from there through the processes of normalization, dimensionality reduction, and analysis -- this essay argues against the commonly held notion that counting words reduces complexity, suggesting instead that semantic models embed textual objects in highly complex structures that are extremely sensitive to historical context and subtle nuances in meaning. Word frequencies aren't static, given things that simply exist in a text. They're produced through the act of modeling, and the mathematical structures they imply dissolve both words and texts into elaborate systems of mutual interrelation.http://culturalanalytics.scholasticahq.com/article/11830-is-there-a-text-in-my-data-part-1-on-counting-words.pdf
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Michael Gavin
spellingShingle	Michael Gavin Is there a text in my data? (Part 1): on counting words Journal of Cultural Analytics
author_facet	Michael Gavin
author_sort	Michael Gavin
title	Is there a text in my data? (Part 1): on counting words
title_short	Is there a text in my data? (Part 1): on counting words
title_full	Is there a text in my data? (Part 1): on counting words
title_fullStr	Is there a text in my data? (Part 1): on counting words
title_full_unstemmed	Is there a text in my data? (Part 1): on counting words
title_sort	is there a text in my data? (part 1): on counting words
publisher	Department of Languages, Literatures, and Cultures at McGill University
series	Journal of Cultural Analytics
issn	2371-4549
description	This essay is the first in a two-part series. This first installment invites readers to consider a few very basic questions: what does it mean to count words in a text? What happens to the text, and to our understanding of it, when we decompose it into a series of word counts? What relation exists between the textual domain and its numerical image? Or, to restate this question with a nod to literary critic stanley fish, "is there a text in my data?" following one document through a series of typical transformations -- first into a simple list of words and their frequencies, then to a vector of elements in a matrix, and from there through the processes of normalization, dimensionality reduction, and analysis -- this essay argues against the commonly held notion that counting words reduces complexity, suggesting instead that semantic models embed textual objects in highly complex structures that are extremely sensitive to historical context and subtle nuances in meaning. Word frequencies aren't static, given things that simply exist in a text. They're produced through the act of modeling, and the mathematical structures they imply dissolve both words and texts into elaborate systems of mutual interrelation.
url	http://culturalanalytics.scholasticahq.com/article/11830-is-there-a-text-in-my-data-part-1-on-counting-words.pdf
work_keys_str_mv	AT michaelgavin isthereatextinmydatapart1oncountingwords
_version_	1725083430207619072

Is there a text in my data? (Part 1): on counting words

Similar Items