Text-based language identification for the South African languages
We investigate the factors that determine the performance of text-based language identification, with a particular focus on the 11 official languages of South Africa. Our study uses n-gram statistics as features for classification. In particular, we compare support vector machines, Naïve Bayesian an...
Main Author: | |
---|---|
Other Authors: | |
Published: |
2013
|
Subjects: | |
Online Access: | http://hdl.handle.net/2263/27725 http://upetd.up.ac.za/thesis/available/etd-09042008-133715/ |