Korpusbasierte Wörterbucharbeit mit den Daten des Projekts Deutscher Wortschatz

The corpus project Deutscher Wortschatz (German Vocabulary) at Leipzig University is collecting and processing textual data for 15 years. It now consists of approx. 2 billion running words in 160 million sentences. The dictionary is online available at www.wortschatz.uni-leipzig.de and, moreover, co...

Full description

Bibliographic Details
Main Author:	Quasthoff, Uwe
Format:	Article
Language:	deu
Published:	Bern Open Publishing 2009-01-01
Series:	Linguistik Online
Online Access:	http://www.linguistik-online.de/39_09/quasthoff.pdf

id	doaj-903af5660a9d4677aeeb91ace385d9cb
record_format	Article
spelling	doaj-903af5660a9d4677aeeb91ace385d9cb2021-07-02T01:52:59ZdeuBern Open PublishingLinguistik Online1615-30142009-01-01393151162Korpusbasierte Wörterbucharbeit mit den Daten des Projekts Deutscher WortschatzQuasthoff, UweThe corpus project Deutscher Wortschatz (German Vocabulary) at Leipzig University is collecting and processing textual data for 15 years. It now consists of approx. 2 billion running words in 160 million sentences. The dictionary is online available at www.wortschatz.uni-leipzig.de and, moreover, contains word co-occurrence data.The pre-processing of the data used mainly language independent methods and were used for corpora in other languages, too.The paper describes the production process for three dictionaries for which these corpus data were used: a thesaurus, a dictionary of neologisms, and a collocation dictionary. In all cases, the raw data for the dictionary entries were produced automatically, and the final entries were written only using these pre-selections. In the case of the thesaurus, the preprocessing consisted in a corpus based detection of semantically similar words. For the neologism dictionary the yearly frequency information were used and for the collocation dictionary, word co-occurrences and part of speech information were combined.http://www.linguistik-online.de/39_09/quasthoff.pdf
collection	DOAJ
language	deu
format	Article
sources	DOAJ
author	Quasthoff, Uwe
spellingShingle	Quasthoff, Uwe Korpusbasierte Wörterbucharbeit mit den Daten des Projekts Deutscher Wortschatz Linguistik Online
author_facet	Quasthoff, Uwe
author_sort	Quasthoff, Uwe
title	Korpusbasierte Wörterbucharbeit mit den Daten des Projekts Deutscher Wortschatz
title_short	Korpusbasierte Wörterbucharbeit mit den Daten des Projekts Deutscher Wortschatz
title_full	Korpusbasierte Wörterbucharbeit mit den Daten des Projekts Deutscher Wortschatz
title_fullStr	Korpusbasierte Wörterbucharbeit mit den Daten des Projekts Deutscher Wortschatz
title_full_unstemmed	Korpusbasierte Wörterbucharbeit mit den Daten des Projekts Deutscher Wortschatz
title_sort	korpusbasierte wörterbucharbeit mit den daten des projekts deutscher wortschatz
publisher	Bern Open Publishing
series	Linguistik Online
issn	1615-3014
publishDate	2009-01-01
description	The corpus project Deutscher Wortschatz (German Vocabulary) at Leipzig University is collecting and processing textual data for 15 years. It now consists of approx. 2 billion running words in 160 million sentences. The dictionary is online available at www.wortschatz.uni-leipzig.de and, moreover, contains word co-occurrence data.The pre-processing of the data used mainly language independent methods and were used for corpora in other languages, too.The paper describes the production process for three dictionaries for which these corpus data were used: a thesaurus, a dictionary of neologisms, and a collocation dictionary. In all cases, the raw data for the dictionary entries were produced automatically, and the final entries were written only using these pre-selections. In the case of the thesaurus, the preprocessing consisted in a corpus based detection of semantically similar words. For the neologism dictionary the yearly frequency information were used and for the collocation dictionary, word co-occurrences and part of speech information were combined.
url	http://www.linguistik-online.de/39_09/quasthoff.pdf
work_keys_str_mv	AT quasthoffuwe korpusbasierteworterbucharbeitmitdendatendesprojektsdeutscherwortschatz
_version_	1721344181166669824

Korpusbasierte Wörterbucharbeit mit den Daten des Projekts Deutscher Wortschatz

Similar Items