Summary: | In 1982, Francis (1991: 17) defines a corpus as:’A collection of texts assumed to be representative of a given language, dialect, or other subset of a language, to be used for linguistic analysis.’The representativeness of a corpus would then be taken into account by most of the main publications which dealt with corpus linguistics. This paper aims at defining the concept of representativeness in corpus design and at illustrating its main features as well as the various methods used to achieve it, which will include a discussion on the issues of categorization, sampling or the required size of a corpus.We will try to achieve a better understanding of the concept of representativeness through a review of the related literature on corpus linguistics. The various methods that are proposed and implemented in order to achieve representativeness in corpus design will be discussed and contrasted. The two main methods that will be examined are Biber’s stratification techniques (1993a, 1993b) on the one hand, and the methods represented by Sinclair’s "monitor corpus" (1991, 1996, 2004) on the other hand. Finally, we will address the issue of the required size of a corpus and provide a brief review of the current situation regarding corpus design along with some recommendations for corpus building.
|