Intelligent News Recommender System
碩士 === 國立成功大學 === 資訊工程研究所 === 89 === With the growth of the full-text documents, recommender systems have received an increasing amount of attention. Many existing recommender systems in today use collaborative filtering methods that based on recommendations of other users’ preferences. This researc...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2001
|
Online Access: | http://ndltd.ncl.edu.tw/handle/39079058399225210614 |
Summary: | 碩士 === 國立成功大學 === 資訊工程研究所 === 89 === With the growth of the full-text documents, recommender systems have received an increasing amount of attention. Many existing recommender systems in today use collaborative filtering methods that based on recommendations of other users’ preferences. This research will focus on Intelligent News Recommender System (INRA), an Hierarchical Text Categorization-based news recommending system, to help people find articles that they will like in the huge stream of available documents. This approach has the advantage of being able to recommend previously unrated documents to users with unique interests. We propose a news recommending system that utilizes information retrieval and a machine-learning algorithm for text categorization. In addition, we analyze the utility of several methods of feature selection (i.e. methods of choosing the representation of a document that the learning algorithm actually uses). Experimental results demonstrate that this approach can produce useful recommendations.
ABSTRACT II
FIGURE LISTING VI
TABLE LISTING VII
CHAPTER 1 INTRODUCTION 1
1.1 THE INFORMATION OVERLOAD PROBLEM 1
1.2 RESEARCH MOTIVATIONS 2
1.3 THE APPROACH 4
1.4 THESIS ORGANIZATION 5
CHAPTER 2 LITERATURE REVIEW AND RELATED WORKS 6
2.1 WHAT IS A RECOMMENDER SYSTEM 6
2.2 SEVERAL APPROACHES TO RECOMMENDER SYSTEM 6
2.2.1 Collaborative Filtering 7
2.2.2 Content-Based Filtering 8
2.2.3 Knowledge-Based Filtering 9
2.2.4 Combination With Some Approaches In Filtering 9
2.3 BACKGROUND ON TEXT CATEGORIZATION IN MACHINE LEARNING 10
2.4 FEATURE SELECTION METHODS 11
2.4.1 Document Frequency Thresholding 11
2.4.2 Information Gain 11
2.4.3 Mutual Information 12
2.4.4 Statistic 12
2.4.5 Term Strength 13
2.5 CLASSIFIER-BASED TEXT CATEGORIZATION METHODOLOGIES 13
2.5.1 Rocchio’s Classifier 14
2.5.2 Naïve Bayes Classifier 15
2.5.3 k-Nearest Neighbor Classifier 16
2.5.4 Neural Network Classifier 17
CHAPTER 3 INTELLIGENT NEWS RECOMMENDER AGENT 21
3.1 INRA ARCHITECTURE 21
3.1.1 User Interface 22
3.1.2 System Databases 23
3.1.3 Lexical Analysis 24
3.1.4 INRA Engine 24
4.1.5 HTC Engine 28
3.2 HIERARCHICAL TEXT CATEGORIZATION MODEL 29
3.2.1 Part-Of-Speech Tagger 30
3.2.2 Stemming 31
3.2.3 Stop Words Filter 32
3.2.4 Hierarchical Neural-Based Classifier 33
3.3 FEATURE SELECTION APPROACH 39
CHAPTER 4 EXPERIMENTAL DESIGN AND ANALYSIS 42
4.1 THE CORPUS 42
4.1.1 Documents 42
4.1.2 File Format 43
4.1.3 Document Internal Tags 43
4.1.4 Categories 45
4.1.5 The Hierarchical Structure of Categories 47
4.2 FEATURE SELECTION ANALYSIS 48
4.3 HIERARCHY-BASED AUTOMATIC TEXT CATEGORIZATION 50
4.4 INRA SYSTEM PERFORMANCE 58
4.5 COMPARISON WITH OTHER RECOMMENDER SYSTEMS 61
CHAPTER 5 CONCLUSIONS AND FUTURE WORKS 64
5.1 CONCLUSIONS 64
5.2 FUTURE WORKS 65
REFERENCES 67
APPENDIX A STOP-WORD LIST 71
APPENDIX B PART-OF-SPEECH TAGS 73
|
---|