Summary: | This thesis applies the word embedding mapping approach to make a lexical comparison from academic word usage perspective. We aim to demonstrate the differences in academic word usage between a corpus of student writings and a corpus of academic English, as well as a corpus of student writings and social media texts. The Vecmap mapping algorithm, commonly used in solving cross-language mapping problems, was used to map academic English vector space and social media text vector space into the common student writing vector space to facilitate the comparison of word representations from different corpora and to visualize the comparison results. The average distance was defined as a measure of word usage differences of 420 typical academic words between each two corpora, and principal component analysis was applied to visualize the differences. A rank-biased overlap approach was adopted to evaluate the results of the proposed approach. The experimental results show that the usage of academic words of student writings corpus is more similar to the academic English corpus than to the social media text corpus.
|