Text Mining with Semi-Supervised Learning

博士 === 國立交通大學 === 資訊科學與工程研究所 === 103 === As the Internet grows, many overwhelming information sources, including the documents and blog articles, are available on the web. These information sources comprise a lot of semantic information, since they are originally created to deliver information to th...

Full description

Bibliographic Details
Main Authors: Hsaio, Wen-Hoar, 蕭文豪
Other Authors: Lee, Chia-Hoang
Format: Others
Language:en_US
Published: 2015
Online Access:http://ndltd.ncl.edu.tw/handle/40666046315556759662
Description
Summary:博士 === 國立交通大學 === 資訊科學與工程研究所 === 103 === As the Internet grows, many overwhelming information sources, including the documents and blog articles, are available on the web. These information sources comprise a lot of semantic information, since they are originally created to deliver information to the people. How to effectively and automatically organize these articles or documents has been an attractive research field for the machine learning community. Semi-supervised learning, learning from a combination of both labeled and unlabeled data, is a machine learning approach between unsupervised learning and supervised learning. It has recently became an active research area in machine learning and received a lot of attention over the last decade. Besides, sparse representations have proven to be an extremely powerful tool for acquiring, representing, and compressing high-dimensional objects in signal processing and computer vision. Moreover, learning with Universum, which uses the examples with different distributions to the target ones to estimate prior model information, is a popular research subject in machine learning. This thesis focuses on text mining with semi-supervised learning to propose four semi-supervised learning algorithms, which are Constrained-PLSA, SSS-MF, Semi-LDC and ԱSemi-AdaBoost.MH. This thesis conducts experiments on four famous real data sets and uses several state-of-the-art semi-supervised learning algorithms to compare with the proposed algorithms. The experimental results indicate that the proposed method generally outperforms the other compared semi-supervised learning methods on given data sets.