Olomouc Corpus of Spoken Czech: characterization and main features of the project

This study presents the results of the author's research project called Olomouc Corpus of Spoken Czech (OCSC). The paper is focused on the state and partial phases of constructing the corpora, its methodology and annotation. Within the OCSC we use so called dual system of transcription, which m...

Full description

Bibliographic Details
Main Author: Petr Pořízka
Format: Article
Language:deu
Published: Bern Open Publishing 2009-04-01
Series:Linguistik Online
Online Access:https://bop.unibe.ch/linguistik-online/article/view/505
id doaj-89f4fc1a8bd84f21aa336b15d91ed821
record_format Article
spelling doaj-89f4fc1a8bd84f21aa336b15d91ed8212021-09-13T12:53:28ZdeuBern Open PublishingLinguistik Online1615-30142009-04-0138210.13092/lo.38.505Olomouc Corpus of Spoken Czech: characterization and main features of the projectPetr PořízkaThis study presents the results of the author's research project called Olomouc Corpus of Spoken Czech (OCSC). The paper is focused on the state and partial phases of constructing the corpora, its methodology and annotation. Within the OCSC we use so called dual system of transcription, which means (1) an orthographic one with the purpose of linguistic (morpho-logical) analysis and tagging and (2) a phonetic version of transcript which consists of three layers of the text: first the real transcription and further various types of the metatexts as a second and third layer, including communication aspects of the texts. The criteria of selection of speakers are also listed here and the highly important statistical analysis of the sociolin-guistic categories (gender, age, type of education, types of recordings) is presented as well. This analysis can serve as a base for a partial correction of possible non-balance among those sociolinguistic parameters. The annotation rules and principles are mentioned at the end of this study. https://bop.unibe.ch/linguistik-online/article/view/505
collection DOAJ
language deu
format Article
sources DOAJ
author Petr Pořízka
spellingShingle Petr Pořízka
Olomouc Corpus of Spoken Czech: characterization and main features of the project
Linguistik Online
author_facet Petr Pořízka
author_sort Petr Pořízka
title Olomouc Corpus of Spoken Czech: characterization and main features of the project
title_short Olomouc Corpus of Spoken Czech: characterization and main features of the project
title_full Olomouc Corpus of Spoken Czech: characterization and main features of the project
title_fullStr Olomouc Corpus of Spoken Czech: characterization and main features of the project
title_full_unstemmed Olomouc Corpus of Spoken Czech: characterization and main features of the project
title_sort olomouc corpus of spoken czech: characterization and main features of the project
publisher Bern Open Publishing
series Linguistik Online
issn 1615-3014
publishDate 2009-04-01
description This study presents the results of the author's research project called Olomouc Corpus of Spoken Czech (OCSC). The paper is focused on the state and partial phases of constructing the corpora, its methodology and annotation. Within the OCSC we use so called dual system of transcription, which means (1) an orthographic one with the purpose of linguistic (morpho-logical) analysis and tagging and (2) a phonetic version of transcript which consists of three layers of the text: first the real transcription and further various types of the metatexts as a second and third layer, including communication aspects of the texts. The criteria of selection of speakers are also listed here and the highly important statistical analysis of the sociolin-guistic categories (gender, age, type of education, types of recordings) is presented as well. This analysis can serve as a base for a partial correction of possible non-balance among those sociolinguistic parameters. The annotation rules and principles are mentioned at the end of this study.
url https://bop.unibe.ch/linguistik-online/article/view/505
work_keys_str_mv AT petrporizka olomouccorpusofspokenczechcharacterizationandmainfeaturesoftheproject
_version_ 1717380697376686080