A Corpus-based Approach to Academic Vocabulary

碩士 === 國立交通大學 === 英語教學研究所 === 95 === English for Academic Purposes (EAP) has been attracting more attention than it was because of the predominant role of English in the research world and the increasing number of students in higher education. Research articles (RAs), among all the genres in EAP, ha...

Full description

Bibliographic Details
Main Authors: Mei-Hung Lin, 林美宏
Other Authors: Chih-Hua Kuo
Format: Others
Language:en_US
Published: 2007
Online Access:http://ndltd.ncl.edu.tw/handle/08421601202783719833
Description
Summary:碩士 === 國立交通大學 === 英語教學研究所 === 95 === English for Academic Purposes (EAP) has been attracting more attention than it was because of the predominant role of English in the research world and the increasing number of students in higher education. Research articles (RAs), among all the genres in EAP, have been widely studied as a result of their wide distribution and promotional nature. Studies of RAs have examined various aspects of this genre, especially the textual organization, rhetorical functions, and linguistic features. The examination of RA Introduction, in particular, becomes the most studied section, following the seminal work of Swales’ CARS model. On the other hand, vocabulary learning has regained momentum in recent years. Some studies focused on providing learners with specific vocabulary learning goals through developing wordlists of different purposes. Some further extended the study of vocabulary to word combinations such as collocations or lexical bundles. Still others investigated how words are used in various discourse contexts. Most vocabulary studies nowadays are based on the analysis of target corpora. The corpus-based approach exploits authentic and large amount of language use data, often using NLP tools to facilitate efficient analysis. However, in the field of genre analysis, little research has been devoted to the generic nature of specialized vocabulary; in other words, relating vocabulary use to the rhetorical functions of a genre. This study, therefore, aims at exploring vocabulary use in RAs, particular in the Introduction section, in relation to its rhetorical functions. A corpus-based, genre-informed approach is used to examine how rhetorical functions or moves are realized through move-signaling words. We construct a specialized corpus, consisting of 60 RAs in the field of computer science (CS). All the RAs are coded with a set of self-developed coding scheme. Then, the text samples are analyzed quantitatively with the help of readily-available or self-developed NLP tools. To explore the nature of words used in the RAs in this particular field, we compile the frequency list of the corpus and analyze the coverage of the GSL(28.20%), AWL(12.75%), and technical words (as generally represented by off-list words) (59.05%) in the list. As shown from these figures, technical vocabulary accounts for a great deal in the CS corpus, suggesting the vocabulary learning goal of learners in CS could be directed towards words other than GSL or AWL. Word frequency profiles further reveal that a very small number of word-forms have very high occurrence rate while low frequency words account for more than half of the vocabulary of the corpus. It can then be inferred that the low-frequency words form a very wide range of vocabulary repertoire RA writers need to use. As a result, we further develop a CS wordlist for pedagogical purposes. It consists of 1402 word families and covers 95% of the vocabulary (types) in the corpus. Next, our focus is directed towards identifying rhetorical functions or moves in RA Introductions in order to further investigate move-signaling words. The major and optional moves are identified based on frequency and range. We then analyze common move patterns for each of the major moves, including 3-move and 4-move patterns. To explore how the moves are realized through vocabulary, we extend our examination from words to word combinations (or lexical bundles) since each register has its own set of lexical bundles which can represent its typical rhetorical functions. Lexical bundles in the Introduction as well as each major move are found. It is observed that there are two types of meaningful bundles. One is the bundles that can signal the rhetorical functions of a specific move while another type of bundles reflects general academic discourse functions, categorized in this study as general bundles. General bundles are further categorized into stances bundles, discourse organizers and referential bundles based on the discourse functions they perform in texts. Among them, referential bundles are found most frequently used. Pedagogical applications and implications such as the use of concordancing tools in the learning of academic vocabulary are finally discussed on the basis of research results.