Text-based language identification for the South African languages

We investigate the factors that determine the performance of text-based language identification, with a particular focus on the 11 official languages of South Africa. Our study uses n-gram statistics as features for classification. In particular, we compare support vector machines, Naïve Bayesian an...

Full description

Bibliographic Details
Main Author:	Botha, Gerrit Reinier
Other Authors:	Prof E Barnard
Published:	2013
Subjects:	Naïve bayesian classification Support vector machine N-gram statistics Text-based language identification Difference-in-frequency classification UCTD
Online Access:	http://hdl.handle.net/2263/27725 http://upetd.up.ac.za/thesis/available/etd-09042008-133715/

Internet

http://hdl.handle.net/2263/27725
http://upetd.up.ac.za/thesis/available/etd-09042008-133715/

Text-based language identification for the South African languages

Internet

Similar Items