Summary: | 碩士 === 國立交通大學 === 資訊科學與工程研究所 === 102 === The rise of big data provides an opportunity for the enterprises to use data analytics to gain competitive advantage, but it also brings challenges to process, manage and analyze the large data sets. One typical challenge is to process large volumes of streaming data in real time. Online machine learning allows the model to learn one instance at a time, in which the model is updated according to the prediction result and the true label of the instance. Compared with batch machine learning algorithms, online machine learning is more appropriate to process streaming data, and it can adjust learning model as receiving more new unknown data. Besides online processing, parameter selection is an important task in machine learning in dealing with model selection, but the task is generally achieved by heuristic rules or cross-validation technique with a validation set. In big data process, parameter should be adapted as with data rather than a fixed one. Nonparametric Bayesian model provides a means for the model to adapt parameters with the data. This study proposes an online Chinese Restaurant Process algorithm, which extended from Chinese Restaurant Process (CRP). The proposed algorithm is an online and nonparametric parameter algorithm, so it can process streaming data efficiently and the parameters are adapted with the data. Compared with CRP, the proposed algorithm is an online algorithm, in which we use regret theory to design a new prior knowledge and likelihood function based on the consistence between the real label information and prediction result. In the experiments, the proposed algorithm works well in large data set, and generally
outperform the other online machine learning algorithms.
|