Psycholinguistic dataset on language use in 1145 novels published in English and Dutch

This dataset includes psycholinguistic data on 694 English-language and 451 Dutch-language novels, acquired with computerised analysis of digitised novels published mainly between 1800 and 2018. The English-language novels have a total word count of 66.9 million words, while the Dutch-language novel...

Full description

Bibliographic Details
Main Authors: Severi Luoto, Andreas van Cranenburgh
Format: Article
Language:English
Published: Elsevier 2021-02-01
Series:Data in Brief
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352340920315353
id doaj-fd69deb082c44410ad12370ab36ad3f2
record_format Article
spelling doaj-fd69deb082c44410ad12370ab36ad3f22020-12-25T05:10:39ZengElsevierData in Brief2352-34092021-02-0134106655Psycholinguistic dataset on language use in 1145 novels published in English and DutchSeveri Luoto0Andreas van Cranenburgh1English, Drama and Writing Studies, University of Auckland, 1010 Auckland, New Zealand; School of Psychology, University of Auckland, 1010 Auckland, New ZealandDepartment of Information Science, University of Groningen, Oude Kijk in 't Jatstraat 26, 9712 EK Groningen, the Netherlands; Corresponding author.This dataset includes psycholinguistic data on 694 English-language and 451 Dutch-language novels, acquired with computerised analysis of digitised novels published mainly between 1800 and 2018. The English-language novels have a total word count of 66.9 million words, while the Dutch-language novels comprise 49.6 million words, therefore offering large, representative samples for both languages. The data provided in this article include 93 linguistic and psycholinguistic outcome variables for the English-language novels, acquired using Linguistic Inquiry and Word Count (LIWC) version 2015, and 68 linguistic and psycholinguistic outcome variables for the Dutch-language novels, acquired using Linguistic Inquiry and Word Count (LIWC) version 2001. The dataset also includes word frequencies (unigram and bigram) for each novel. The metadata for each novel include year of publication, authors’ nationality, sex, age at publication, and sexual orientation (the latter only in the English-language dataset), making it possible for researchers to study the data along these parameters. The use of these data can help researchers illuminate how word use reflects psychological processes in more than two centuries of literary art in English and in contemporary Dutch novels.http://www.sciencedirect.com/science/article/pii/S2352340920315353StylometryLiteratureLIWCPsycholinguisticsCorpus linguisticsDigital humanities
collection DOAJ
language English
format Article
sources DOAJ
author Severi Luoto
Andreas van Cranenburgh
spellingShingle Severi Luoto
Andreas van Cranenburgh
Psycholinguistic dataset on language use in 1145 novels published in English and Dutch
Data in Brief
Stylometry
Literature
LIWC
Psycholinguistics
Corpus linguistics
Digital humanities
author_facet Severi Luoto
Andreas van Cranenburgh
author_sort Severi Luoto
title Psycholinguistic dataset on language use in 1145 novels published in English and Dutch
title_short Psycholinguistic dataset on language use in 1145 novels published in English and Dutch
title_full Psycholinguistic dataset on language use in 1145 novels published in English and Dutch
title_fullStr Psycholinguistic dataset on language use in 1145 novels published in English and Dutch
title_full_unstemmed Psycholinguistic dataset on language use in 1145 novels published in English and Dutch
title_sort psycholinguistic dataset on language use in 1145 novels published in english and dutch
publisher Elsevier
series Data in Brief
issn 2352-3409
publishDate 2021-02-01
description This dataset includes psycholinguistic data on 694 English-language and 451 Dutch-language novels, acquired with computerised analysis of digitised novels published mainly between 1800 and 2018. The English-language novels have a total word count of 66.9 million words, while the Dutch-language novels comprise 49.6 million words, therefore offering large, representative samples for both languages. The data provided in this article include 93 linguistic and psycholinguistic outcome variables for the English-language novels, acquired using Linguistic Inquiry and Word Count (LIWC) version 2015, and 68 linguistic and psycholinguistic outcome variables for the Dutch-language novels, acquired using Linguistic Inquiry and Word Count (LIWC) version 2001. The dataset also includes word frequencies (unigram and bigram) for each novel. The metadata for each novel include year of publication, authors’ nationality, sex, age at publication, and sexual orientation (the latter only in the English-language dataset), making it possible for researchers to study the data along these parameters. The use of these data can help researchers illuminate how word use reflects psychological processes in more than two centuries of literary art in English and in contemporary Dutch novels.
topic Stylometry
Literature
LIWC
Psycholinguistics
Corpus linguistics
Digital humanities
url http://www.sciencedirect.com/science/article/pii/S2352340920315353
work_keys_str_mv AT severiluoto psycholinguisticdatasetonlanguageusein1145novelspublishedinenglishanddutch
AT andreasvancranenburgh psycholinguisticdatasetonlanguageusein1145novelspublishedinenglishanddutch
_version_ 1724371047317241856