Efficient Indexing for Structured and Unstructured Data
The collection of digital data is growing at an exponential rate. Data originates from wide range of data sources such as text feeds, biological sequencers, internet traffic over routers, through sensors and many other sources. To mine intelligent information from these sources, users have to query...
Main Author: | |
---|---|
Other Authors: | |
Format: | Others |
Language: | en |
Published: |
LSU
2014
|
Subjects: | |
Online Access: | http://etd.lsu.edu/docs/available/etd-08182014-125357/ |
id |
ndltd-LSU-oai-etd.lsu.edu-etd-08182014-125357 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-LSU-oai-etd.lsu.edu-etd-08182014-1253572014-09-05T03:45:13Z Efficient Indexing for Structured and Unstructured Data Patil, Manish Madhukar Computer Science The collection of digital data is growing at an exponential rate. Data originates from wide range of data sources such as text feeds, biological sequencers, internet traffic over routers, through sensors and many other sources. To mine intelligent information from these sources, users have to query the data. Indexing techniques aim to reduce the query time by preprocessing the data. Diversity of data sources in real world makes it imperative to develop application specific indexing solutions based on the data to be queried. Data can be structured i.e., relational tables or unstructured i.e., free text. Moreover, increasingly many applications need to seamlessly analyze both kinds of data making data integration a central issue. Integrating text with structured data needs to account for missing values, errors in the data etc. Probabilistic models have been proposed recently for this purpose. These models are also useful for applications where uncertainty is inherent in data e.g. sensor networks. This dissertation aims to propose efficient indexing solutions for several problems that lie at the intersection of database and information retrieval such as joining ranked inputs, full-text documents searching etc. Other well-known problems of ranked retrieval and pattern matching are also studied under probabilistic settings. For each problem, the worst-case theoretical bounds of the proposed solutions are established and/or their practicality is demonstrated by thorough experimentation. Shah, Rahul Park, Seung-Jong Chen, Jianhua Moldovan, Dorel LSU 2014-09-04 text application/pdf http://etd.lsu.edu/docs/available/etd-08182014-125357/ http://etd.lsu.edu/docs/available/etd-08182014-125357/ en unrestricted I hereby certify that, if appropriate, I have obtained and attached herein a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dissertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to LSU or its agents the non-exclusive license to archive and make accessible, under the conditions specified below and in appropriate University policies, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report. |
collection |
NDLTD |
language |
en |
format |
Others
|
sources |
NDLTD |
topic |
Computer Science |
spellingShingle |
Computer Science Patil, Manish Madhukar Efficient Indexing for Structured and Unstructured Data |
description |
The collection of digital data is growing at an exponential rate. Data originates from wide range of data sources such as text feeds, biological sequencers, internet traffic over routers, through sensors and many other sources. To mine intelligent information from these sources, users have to query the data. Indexing techniques aim to reduce the query time by preprocessing the data. Diversity of data sources in real world makes it imperative to develop application specific indexing solutions based on the data to be queried. Data can be structured i.e., relational tables or unstructured i.e., free text. Moreover, increasingly many applications need to seamlessly analyze both kinds of data making data integration a central issue. Integrating text with structured data needs to account for missing values, errors in the data etc. Probabilistic models have been proposed recently for this purpose. These models are also useful for applications where uncertainty is inherent in data e.g. sensor networks. This dissertation aims to propose efficient indexing solutions for several problems that lie at the intersection of database and information retrieval such as joining ranked inputs, full-text documents searching etc. Other well-known problems of ranked retrieval and pattern matching are also studied under probabilistic settings. For each problem, the worst-case theoretical bounds of the proposed solutions are established and/or their practicality is demonstrated by thorough experimentation. |
author2 |
Shah, Rahul |
author_facet |
Shah, Rahul Patil, Manish Madhukar |
author |
Patil, Manish Madhukar |
author_sort |
Patil, Manish Madhukar |
title |
Efficient Indexing for Structured and Unstructured Data |
title_short |
Efficient Indexing for Structured and Unstructured Data |
title_full |
Efficient Indexing for Structured and Unstructured Data |
title_fullStr |
Efficient Indexing for Structured and Unstructured Data |
title_full_unstemmed |
Efficient Indexing for Structured and Unstructured Data |
title_sort |
efficient indexing for structured and unstructured data |
publisher |
LSU |
publishDate |
2014 |
url |
http://etd.lsu.edu/docs/available/etd-08182014-125357/ |
work_keys_str_mv |
AT patilmanishmadhukar efficientindexingforstructuredandunstructureddata |
_version_ |
1716711352112775168 |