Document Latent Topic Mining By Cloud Computing

碩士 === 國立高雄第一科技大學 === 資訊管理研究所 === 100 === With the rapid growth on the demand of cloud computing, users now organize their docu- ments or files directly through the cloud storage services for flexible access. That inspires the service providers to offer more versatile functionalities to solicit more...

Full description

Bibliographic Details
Main Authors: Yu-Hsin Li, 李侑鑫
Other Authors: Frank S.C. Tseng
Format: Others
Language:zh-TW
Published: 2012
Online Access:http://ndltd.ncl.edu.tw/handle/66899068919106258976
Description
Summary:碩士 === 國立高雄第一科技大學 === 資訊管理研究所 === 100 === With the rapid growth on the demand of cloud computing, users now organize their docu- ments or files directly through the cloud storage services for flexible access. That inspires the service providers to offer more versatile functionalities to solicit more users to join in. However over a span of time, a surprising big data volume will be radically increased. Therefore, to help users solve information overloading problem and to sift out the mismatching searching result from a huge amount of information require a more effective computing method. As the content of files is usually created by users with different intentions, the duty system can’t fully figure out these intentions through the occasionally-provided keywords. Although the full-text index- ing is the most appropriate method for users, it only makes sense for those who already know what exactly they want. Fortunately, statistical topic models can provide complementary func- tionalities to identify semantically-coherent ‘topics’, such that semantically-related documents can be easily recognized to provide meaningful result for users. In this paper, we investigate the topic modeling approach —Latent Dirichlet Allocation (LDA) by map-reduce approach in a cloud computing environment. where all the subsets of the document set will be processed by different sites independently. Hopefully, the experimental result shows that our approach can help us offer a linear scalability and flexibility of identifying semantically-coherent topics even when the size of the target document set increases drastically. We believe this would be fruitful for normal users or even enterprises to extract the needed documents to collaborate with more applications based on the pay-as-you-go rationale.