Practical Inverted Index Based on Elias-Fano Encoding: A Case Study of Versioned Documents

碩士 === 國立清華大學 === 資訊工程學系所 === 105 === Inverted Index is an important and well-known method for document retrieval. However, as the volume of documents is growing very quickly nowadays, we have to pay much cost for the basic inverted index to achieve common useful functions like document listing, tim...

Full description

Bibliographic Details
Main Authors: Yu, Jia-Hong, 余家鴻
Other Authors: Hon, Wing-Kai
Format: Others
Language:en_US
Published: 2017
Online Access:http://ndltd.ncl.edu.tw/handle/3a4nqn
Description
Summary:碩士 === 國立清華大學 === 資訊工程學系所 === 105 === Inverted Index is an important and well-known method for document retrieval. However, as the volume of documents is growing very quickly nowadays, we have to pay much cost for the basic inverted index to achieve common useful functions like document listing, time-travel, top $k$, and the occurrence reporting of phrase queries. For example, for supporting the occurrence reporting query, we need around 1.5 GB index space to store the index in the disk just for around 300 MB data. To solve this space problem, many index compression techniques have been studied. In this thesis, we propose a practical index framework on good space performance with inverted index based on the recently proposed partitioned Elias-Fano encoding, and conduct experiments on real data sets. Our index can support the query functions correctly with only around 150 MB index space for 300 MB input real data. We develop two different methods to query our index, and from the results of our experiments, we discuss what kind of data is more suited for each of these two methods.