Algorithms for aggregate information extraction from sequences

In this thesis, we propose efficient algorithms for aggregate information extraction from sequences and multidimensional arrays. The algorithms proposed are applicable in several important areas, including large databases and DNA sequence segmentation. We first study the problem of efficiently compu...

Full description

Bibliographic Details
Main Author: Bengtsson, Fredrik
Format: Doctoral Thesis
Language:English
Published: Luleå tekniska universitet, Datavetenskap 2007
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-16818
id ndltd-UPSALLA1-oai-DiVA.org-ltu-16818
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-ltu-168182017-10-20T05:30:47ZAlgorithms for aggregate information extraction from sequencesengBengtsson, FredrikLuleå tekniska universitet, DatavetenskapLuleå2007Computer SciencesDatavetenskap (datalogi)In this thesis, we propose efficient algorithms for aggregate information extraction from sequences and multidimensional arrays. The algorithms proposed are applicable in several important areas, including large databases and DNA sequence segmentation. We first study the problem of efficiently computing, for a given range, the range-sum in a multidimensional array as well as computing the k maximum values, called the top-k values. We design two efficient data structures for these problems. For the range-sum problem, our structure supports fast update while preserving low complexity of range-sum query. The proposed top-k structure provides fast query computation in linear time proportional to the sum of the sizes of a two-dimensional query region. We also study the k maximum sum subsequences problem and develop several efficient algorithms. In this problem, the k subsegments of consecutive elements with largest sum are to be found. The segments can potentially overlap, which allows for a large number of possible candidate segments. Moreover, we design an optimal algorithm for ranking the k maximum sum subsequences. Our solution does not require the value of k to be known a priori. Furthermore, an optimal linear-time algorithm is developed for the maximum cover problem of finding k subsequences of consecutive elements of maximum total element sum. Godkänd; 2007; 20070528 (ysko)Doctoral thesis, comprehensive summaryinfo:eu-repo/semantics/doctoralThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-16818Local 02c53e60-0d09-11dc-8745-000ea68e967bDoctoral thesis / Luleå University of Technology 1 jan 1997 → …, 1402-1544 ; 2007:25application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Doctoral Thesis
sources NDLTD
topic Computer Sciences
Datavetenskap (datalogi)
spellingShingle Computer Sciences
Datavetenskap (datalogi)
Bengtsson, Fredrik
Algorithms for aggregate information extraction from sequences
description In this thesis, we propose efficient algorithms for aggregate information extraction from sequences and multidimensional arrays. The algorithms proposed are applicable in several important areas, including large databases and DNA sequence segmentation. We first study the problem of efficiently computing, for a given range, the range-sum in a multidimensional array as well as computing the k maximum values, called the top-k values. We design two efficient data structures for these problems. For the range-sum problem, our structure supports fast update while preserving low complexity of range-sum query. The proposed top-k structure provides fast query computation in linear time proportional to the sum of the sizes of a two-dimensional query region. We also study the k maximum sum subsequences problem and develop several efficient algorithms. In this problem, the k subsegments of consecutive elements with largest sum are to be found. The segments can potentially overlap, which allows for a large number of possible candidate segments. Moreover, we design an optimal algorithm for ranking the k maximum sum subsequences. Our solution does not require the value of k to be known a priori. Furthermore, an optimal linear-time algorithm is developed for the maximum cover problem of finding k subsequences of consecutive elements of maximum total element sum. === Godkänd; 2007; 20070528 (ysko)
author Bengtsson, Fredrik
author_facet Bengtsson, Fredrik
author_sort Bengtsson, Fredrik
title Algorithms for aggregate information extraction from sequences
title_short Algorithms for aggregate information extraction from sequences
title_full Algorithms for aggregate information extraction from sequences
title_fullStr Algorithms for aggregate information extraction from sequences
title_full_unstemmed Algorithms for aggregate information extraction from sequences
title_sort algorithms for aggregate information extraction from sequences
publisher Luleå tekniska universitet, Datavetenskap
publishDate 2007
url http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-16818
work_keys_str_mv AT bengtssonfredrik algorithmsforaggregateinformationextractionfromsequences
_version_ 1718556150059761664