Summary: | 碩士 === 國立交通大學 === 管理學院資訊管理學程 === 107 === Fake news often uses sensational headlines and reports with non-real content, generally to obtain advertising revenue and increase political opposition. Coupled with the ubiquity of social media, it also causes fake news to spread in a very fast way, thus causing fake news to be filled.
In view of the current judgment on fake news, there is no comparative way to predict and judge. So this study uses Kaggle's Fake News dataset as a source dataset. Feature selection uses "Sentiment analysis" to find the emotional intent characteristics of the article; LDA is used to group the articles to find the topic model features; TF-IDF is used to find the word features, and the author feature is selected from the original data set. Then, this study uses four classification methods, Random Forests, XGBoost(eXtreme Gradient Boosting), Naïve Bayes and Logistic Regression for analysis and comparison of predicting fake news.
The experiment results show that the classification method using logistic regression can achieve accuracy 96.32%, it’s the best classification prediction of fake news.
|