Summary: | <p> The rapid adoption of smart phones and the social media boom has increased the interest in location-based services. A new set of applications and popular online services that utilize users' locations have been created, and many ordinary people are increasingly interacting with these services on a daily basis through their smart phones, tablets, cameras, etc., where most of those gadgets come equipped with GPS sensors. The new complex features provided by those applications and the scale of the massive data handled by them impose new and interesting challenges for spatial databases. In this thesis, we present spatial indexing and query processing techniques in response to some of these challenges. </p><p> First, we study how to support approximate keyword search on spatial data. There are many popular websites that support keyword search on their spatial data, such as business listings and photos. In these systems, users may experience difficulties finding the entities they are looking for if they do not know their exact spelling, such as the name of a restaurant. We develop three algorithms for constructing a specialized index that can answer location- based approximate keyword queries, successively improving the time and space efficiency by exploiting the textual and spatial properties of the data. We experimentally demonstrate the efficiency of our techniques on real, large datasets. </p><p> Second, we introduce a framework for converting an in-place update, disk-based data structure to a deferred-update, append-only data structure. We show that converting an R-tree index (and other non-totally ordered index) to an LSM index is non-trivial if the resultant index is expected to have performant read and write operations. Our framework enables the "LSM-ification" of any kind of index structure that supports certain primitive operations, enabling the index to ingest data efficiently. We have implemented our framework in the context of the AsterixDB system as a way to extend both the R-tree and the inverted keyword index to LSM-based indexes. Our results have shown that using an LSM-based version of the R-tree can significantly outperform its conventional counterpart for <i>both</i> ingestion and query speed. </p><p> Third, we study how to optimize the performance of query workloads that favor recent data. There are many use cases where users of a database system are mostly interested in querying recent data. We propose a solution that exploits the natural partitioning property that LSM-based indexes provide for its components, allowing us to filter out many components when answering queries. Our solution is generalizable to any LSM-based index structure including LSM R-trees, and has been implemented in the context of the AsterixDB system. Our experiments show that we can reduce query times by up to 99% for selective range predicates.</p>
|