Text-based language identification for the South African languages

We investigate the factors that determine the performance of text-based language identification, with a particular focus on the 11 official languages of South Africa. Our study uses n-gram statistics as features for classification. In particular, we compare support vector machines, Naïve Bayesian an...

Full description

Bibliographic Details
Main Author: Botha, Gerrit Reinier
Other Authors: Prof E Barnard
Published: 2013
Subjects:
Online Access:http://hdl.handle.net/2263/27725
http://upetd.up.ac.za/thesis/available/etd-09042008-133715/