Summary: | This thesis focuses on how unlabeled data can improve supervised learning classi-fiers in all contexts, for both scarce to abundant label situations. This is meant toaddress the limitations within supervised learning with regards to label availability.Extending the training set with unlabeled data can overcome issues such as selec-tion bias, noise and insufficient data. Based on the overall data distribution andthe initial set of labels, semi-supervised methods provide labels for additional datapoints. The semi-supervised approaches considered in this thesis belong to one ofthe following categories: transductive SVMs, Cluster-then-Label and graph-basedtechniques. Further, we evaluate the behavior of: Logistic regression, Single layerperceptron, SVM and Decision trees. By learning on the extended training set,supervised classifiers are able to generalize better. Based on the results, this the-sis recommends data-processing and algorithmic solutions appropriate to real-worldsituations.
|