Algorithms for aggregate information extraction from sequences
In this thesis, we propose efficient algorithms for aggregate information extraction from sequences and multidimensional arrays. The algorithms proposed are applicable in several important areas, including large databases and DNA sequence segmentation. We first study the problem of efficiently compu...
Main Author: | |
---|---|
Format: | Doctoral Thesis |
Language: | English |
Published: |
Luleå tekniska universitet, Datavetenskap
2007
|
Subjects: | |
Online Access: | http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-16818 |
id |
ndltd-UPSALLA1-oai-DiVA.org-ltu-16818 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-UPSALLA1-oai-DiVA.org-ltu-168182017-10-20T05:30:47ZAlgorithms for aggregate information extraction from sequencesengBengtsson, FredrikLuleå tekniska universitet, DatavetenskapLuleå2007Computer SciencesDatavetenskap (datalogi)In this thesis, we propose efficient algorithms for aggregate information extraction from sequences and multidimensional arrays. The algorithms proposed are applicable in several important areas, including large databases and DNA sequence segmentation. We first study the problem of efficiently computing, for a given range, the range-sum in a multidimensional array as well as computing the k maximum values, called the top-k values. We design two efficient data structures for these problems. For the range-sum problem, our structure supports fast update while preserving low complexity of range-sum query. The proposed top-k structure provides fast query computation in linear time proportional to the sum of the sizes of a two-dimensional query region. We also study the k maximum sum subsequences problem and develop several efficient algorithms. In this problem, the k subsegments of consecutive elements with largest sum are to be found. The segments can potentially overlap, which allows for a large number of possible candidate segments. Moreover, we design an optimal algorithm for ranking the k maximum sum subsequences. Our solution does not require the value of k to be known a priori. Furthermore, an optimal linear-time algorithm is developed for the maximum cover problem of finding k subsequences of consecutive elements of maximum total element sum. Godkänd; 2007; 20070528 (ysko)Doctoral thesis, comprehensive summaryinfo:eu-repo/semantics/doctoralThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-16818Local 02c53e60-0d09-11dc-8745-000ea68e967bDoctoral thesis / Luleå University of Technology 1 jan 1997 → …, 1402-1544 ; 2007:25application/pdfinfo:eu-repo/semantics/openAccess |
collection |
NDLTD |
language |
English |
format |
Doctoral Thesis |
sources |
NDLTD |
topic |
Computer Sciences Datavetenskap (datalogi) |
spellingShingle |
Computer Sciences Datavetenskap (datalogi) Bengtsson, Fredrik Algorithms for aggregate information extraction from sequences |
description |
In this thesis, we propose efficient algorithms for aggregate information extraction from sequences and multidimensional arrays. The algorithms proposed are applicable in several important areas, including large databases and DNA sequence segmentation. We first study the problem of efficiently computing, for a given range, the range-sum in a multidimensional array as well as computing the k maximum values, called the top-k values. We design two efficient data structures for these problems. For the range-sum problem, our structure supports fast update while preserving low complexity of range-sum query. The proposed top-k structure provides fast query computation in linear time proportional to the sum of the sizes of a two-dimensional query region. We also study the k maximum sum subsequences problem and develop several efficient algorithms. In this problem, the k subsegments of consecutive elements with largest sum are to be found. The segments can potentially overlap, which allows for a large number of possible candidate segments. Moreover, we design an optimal algorithm for ranking the k maximum sum subsequences. Our solution does not require the value of k to be known a priori. Furthermore, an optimal linear-time algorithm is developed for the maximum cover problem of finding k subsequences of consecutive elements of maximum total element sum. === Godkänd; 2007; 20070528 (ysko) |
author |
Bengtsson, Fredrik |
author_facet |
Bengtsson, Fredrik |
author_sort |
Bengtsson, Fredrik |
title |
Algorithms for aggregate information extraction from sequences |
title_short |
Algorithms for aggregate information extraction from sequences |
title_full |
Algorithms for aggregate information extraction from sequences |
title_fullStr |
Algorithms for aggregate information extraction from sequences |
title_full_unstemmed |
Algorithms for aggregate information extraction from sequences |
title_sort |
algorithms for aggregate information extraction from sequences |
publisher |
Luleå tekniska universitet, Datavetenskap |
publishDate |
2007 |
url |
http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-16818 |
work_keys_str_mv |
AT bengtssonfredrik algorithmsforaggregateinformationextractionfromsequences |
_version_ |
1718556150059761664 |