Summary: | 碩士 === 國立臺灣大學 === 語言學研究所 === 107 === As the number of Mandarin Chinese speakers continues to increase, variations will inevitably begin to emerge as all speakers do not reside in one place. This variation can stem from internal factors or external ones, such as culture or location. While there exist corpora that can be used to study Mandarin Chinese variation, the existing resources do not offer insight into more colloquial registers. A good source of material that can more reliably reflect everyday speech is subtitles for TV shows, movies, and videos in general. Because the subtitles are meant to reflect dialogue heard on screen, it can better reflect colloquial speech. The goal of this thesis is to create a parallel corpus based on movie subtitles and TED Talks that can allow researchers to study language variation between Taiwan Mandarin and Mainland Mandarin.
|