Analyzing DBpedia Linked Open Data (LOD) on Spark:Movie Box Office Prediction as an Example
碩士 === 國立政治大學 === 資訊科學學系 === 104 === Recent years, Linked Open Data (LOD) has been identified as containing large amount of potential value. How to collect and integrate multiple LOD contents for effective analytics has become a research challenge. LOD is represented as a Resource Description Framew...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2016
|
Online Access: | http://ndltd.ncl.edu.tw/handle/jsfxcs |
Summary: | 碩士 === 國立政治大學 === 資訊科學學系 === 104 === Recent years, Linked Open Data (LOD) has been identified as containing large amount of potential value. How to collect and integrate multiple LOD contents for effective analytics has become a research challenge. LOD is represented as a Resource Description Framework (RDF) format, which can be queried through SPARQL language. But large amount of RDF data is lack of a high performance and scalable storage analysis system. Moreover, big RDF data analytics pipeline is far from perfect. The purpose of this study is to exploit the above research issue. A movie box office sale prediction scenario is demonstrated by using DBpedia with external IMDb movie database. We perform the DBpedia big graph analytics on the Apache Spark platform. The movie box office prediction for optimal model selection is first evaluated by BIC. Then, Naïve Bayes and Bayesian Network optimal model’s ROC and AUC values are obtained to justify our approach.
|
---|