Summary: | 碩士 === 國立臺灣科技大學 === 資訊工程系 === 103 === Wikipedia is a multi-language and wealth-content online encyclopedia. Based on the concept of Web2.0, Wikipedia allows anyone to share and edit Wikipedia content, which also makes Wikipedia easily to be destroyed. Therefore, all Wikipedians pay long-term sustained effort on maintaining the quality of Wikipedia content. The past research directions for Wikipedia vandalism detection focused on the text semantic, feature statistical and machine learning. The current directions focus on language-independent feature analysis and continuity content-context correlation analysis. WikiSERM extracts the key-item based on the Wikipedia edit tag which applies to various languages of Wikipedia. Therefore key-item based on the Wikipedia edit tag has the language-independent feature, and it makes WikiSERM be applied in various language versions of Wikipedia vandalism detection. WikiSERM take the full version of the article as evidence to judge risk trends and to analyze using-status of each key item in a Wikipedia article (e.g., keeps being used or completely deleted). Through analysis of the continuity of key item using-status, we can get risk status of each key item in each corresponding revision. WikiSERM records those risk results of previous revision as two-dimensional array for querying quickly, therefore WikiSERM has the ability to deal with the incremental data and to provide the risk assessment result immediately. Through the analysis of key item transaction in the Wikipedia revision (e.g., add high-risk key item, add low-risk key item, delete high-risk key item, delete low-risk key item), WikiSERM take those over-threshold revisions as a high risk version. Our approach can help Wikipedia administrators to quickly find vandalism revision, and identify which is the high-risk key item in the vandalism revision.
|