Aspects théoriques et méthodologiques de la représentativité des corpus

In 1982, Francis (1991: 17) defines a corpus as:’A collection of texts assumed to be representative of a given language, dialect, or other subset of a language, to be used for linguistic analysis.’The representativeness of a corpus would then be taken into account by most of the main publications wh...

Full description

Bibliographic Details
Main Authors: Najib Arbach, Saandia Ali
Format: Article
Language:English
Published: Cercle linguistique du Centre et de l'Ouest - CerLICO 2014-05-01
Series:Corela
Subjects:
Online Access:http://journals.openedition.org/corela/3029
Description
Summary:In 1982, Francis (1991: 17) defines a corpus as:’A collection of texts assumed to be representative of a given language, dialect, or other subset of a language, to be used for linguistic analysis.’The representativeness of a corpus would then be taken into account by most of the main publications which dealt with corpus linguistics. This paper aims at defining the concept of representativeness in corpus design and at illustrating its main features as well as the various methods used to achieve it, which will include a discussion on the issues of categorization, sampling or the required size of a corpus.We will try to achieve a better understanding of the concept of representativeness through a review of the related literature on corpus linguistics. The various methods that are proposed and implemented in order to achieve representativeness in corpus design will be discussed and contrasted. The two main methods that will be examined are Biber’s stratification techniques (1993a, 1993b) on the one hand, and the methods represented by Sinclair’s "monitor corpus" (1991, 1996, 2004) on the other hand. Finally, we will address the issue of the required size of a corpus and provide a brief review of the current situation regarding corpus design along with some recommendations for corpus building.
ISSN:1638-573X