MDMA. Un modèle pour l’identification et l’annotation des marqueurs discursifs « potentiels » en contexte

Starting from the common observation that there is no recognized closed class of Discourse Markers (DMs) and that their definition may vary from one theoretical framework to another, the aim of the MDMA project (“Model for Discourse Marker Annotation”) is to establish...

Full description

Bibliographic Details
Main Authors: Catherine T. Bolly, Ludivine Crible, Liesbeth Degand, Deniz Uygur-Distexhe
Format: Article
Language:English
Published: Presses universitaires de Caen 2015-09-01
Series:Discours
Subjects:
Online Access:http://journals.openedition.org/discours/9009
id doaj-0ab9b78ec13f410a8530a5ab4a647fb7
record_format Article
spelling doaj-0ab9b78ec13f410a8530a5ab4a647fb72020-11-25T02:52:06ZengPresses universitaires de CaenDiscours 1963-17232015-09-011610.4000/discours.9009MDMA. Un modèle pour l’identification et l’annotation des marqueurs discursifs « potentiels » en contexteCatherine T. BollyLudivine CribleLiesbeth DegandDeniz Uygur-DistexheStarting from the common observation that there is no recognized closed class of Discourse Markers (DMs) and that their definition may vary from one theoretical framework to another, the aim of the MDMA project (“Model for Discourse Marker Annotation”) is to establish an empirical method for the identification and annotation of DMs in spoken French. Central to our proposal is that DMs may be described as clusters of features that, in specific patterns of combination, make it possible to distinguish between more or less prototypical uses of DMs in context. We proceeded in three steps: (i) manual identification of all so-called “potential” DMs in a balanced corpus of spoken French (5,000 words; Belgium and France); (ii) automatic extraction from the corpus of every token corresponding to the candidate DMs previously identified (1,181 tokens) ; and (iii) parameter analysis of a random sample of 200 potential DMs (syntactic, formal and semantic-pragmatic variables). The hypothesis is that the statistical analysis – based on the distributional constraints of the potential DMs at stake – should uncover a certain hierarchy between the different features under scrutiny, regarding their relevance, reliability, and generalizability (or even specificity). In the present paper, we first present the annotation procedure, then we discuss several aspects of inter-rater agreement, and finally discuss the results from the in-depth corpus-based and statistical analyses.http://journals.openedition.org/discours/9009discourse markersannotation modelcorpus-basedmultivariate analysisspoken French
collection DOAJ
language English
format Article
sources DOAJ
author Catherine T. Bolly
Ludivine Crible
Liesbeth Degand
Deniz Uygur-Distexhe
spellingShingle Catherine T. Bolly
Ludivine Crible
Liesbeth Degand
Deniz Uygur-Distexhe
MDMA. Un modèle pour l’identification et l’annotation des marqueurs discursifs « potentiels » en contexte
Discours
discourse markers
annotation model
corpus-based
multivariate analysis
spoken French
author_facet Catherine T. Bolly
Ludivine Crible
Liesbeth Degand
Deniz Uygur-Distexhe
author_sort Catherine T. Bolly
title MDMA. Un modèle pour l’identification et l’annotation des marqueurs discursifs « potentiels » en contexte
title_short MDMA. Un modèle pour l’identification et l’annotation des marqueurs discursifs « potentiels » en contexte
title_full MDMA. Un modèle pour l’identification et l’annotation des marqueurs discursifs « potentiels » en contexte
title_fullStr MDMA. Un modèle pour l’identification et l’annotation des marqueurs discursifs « potentiels » en contexte
title_full_unstemmed MDMA. Un modèle pour l’identification et l’annotation des marqueurs discursifs « potentiels » en contexte
title_sort mdma. un modèle pour l’identification et l’annotation des marqueurs discursifs « potentiels » en contexte
publisher Presses universitaires de Caen
series Discours
issn 1963-1723
publishDate 2015-09-01
description Starting from the common observation that there is no recognized closed class of Discourse Markers (DMs) and that their definition may vary from one theoretical framework to another, the aim of the MDMA project (“Model for Discourse Marker Annotation”) is to establish an empirical method for the identification and annotation of DMs in spoken French. Central to our proposal is that DMs may be described as clusters of features that, in specific patterns of combination, make it possible to distinguish between more or less prototypical uses of DMs in context. We proceeded in three steps: (i) manual identification of all so-called “potential” DMs in a balanced corpus of spoken French (5,000 words; Belgium and France); (ii) automatic extraction from the corpus of every token corresponding to the candidate DMs previously identified (1,181 tokens) ; and (iii) parameter analysis of a random sample of 200 potential DMs (syntactic, formal and semantic-pragmatic variables). The hypothesis is that the statistical analysis – based on the distributional constraints of the potential DMs at stake – should uncover a certain hierarchy between the different features under scrutiny, regarding their relevance, reliability, and generalizability (or even specificity). In the present paper, we first present the annotation procedure, then we discuss several aspects of inter-rater agreement, and finally discuss the results from the in-depth corpus-based and statistical analyses.
topic discourse markers
annotation model
corpus-based
multivariate analysis
spoken French
url http://journals.openedition.org/discours/9009
work_keys_str_mv AT catherinetbolly mdmaunmodelepourlidentificationetlannotationdesmarqueursdiscursifspotentielsencontexte
AT ludivinecrible mdmaunmodelepourlidentificationetlannotationdesmarqueursdiscursifspotentielsencontexte
AT liesbethdegand mdmaunmodelepourlidentificationetlannotationdesmarqueursdiscursifspotentielsencontexte
AT denizuygurdistexhe mdmaunmodelepourlidentificationetlannotationdesmarqueursdiscursifspotentielsencontexte
_version_ 1724731393671430144