On the study of external sorting and selection problems
博士 === 國立清華大學 === 資訊工程學系 === 89 === The problem of how to sort and select data efficiently has been widely discussed. Nowadays, to sort extremely large data is becoming more and more important for large corporations, banks, and government institutions, which rely on computers more and mor...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2001
|
Online Access: | http://ndltd.ncl.edu.tw/handle/38789418437584207958 |
id |
ndltd-TW-089NTHU0392006 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-089NTHU03920062016-07-04T04:17:18Z http://ndltd.ncl.edu.tw/handle/38789418437584207958 On the study of external sorting and selection problems 外部排序與外部搜尋問題之研究 Fang Cheng Lu 呂芳誠 博士 國立清華大學 資訊工程學系 89 The problem of how to sort and select data efficiently has been widely discussed. Nowadays, to sort extremely large data is becoming more and more important for large corporations, banks, and government institutions, which rely on computers more and more deeply in all aspects. Most of the time, sorting and selection are accomplished by external sorting and selection algorithm, in which the data file is too large to fit into main memory and must be resided in the secondary memory. We here present an optimal external sorting algorithm for two-level memory model. Our method is different from the traditional external merge sort and it uses the sampling information to reduce the disk I/Os in the external phase. The algorithm is elegant, simple and it makes a good use of memory available in the recent computer environment. Under the certain memory constraint, this algorithm runs with optimal number of disk I/Os and each record is exactly read twice and written twice. This dissertation also presents an optimal sampling external selection algorithm to select k-th smallest item in large data sets for the two-level memory model. The sampling external selection algorithm is also applied to solve the worldwide selection problem in the Internet environment. The sampling information scheme is used to form an elegant and simple algorithm to reduce the number of disk I/Os. The best case and the worst case of our algorithm are discussed and our algorithm is also efficient for the multiple selections. Finally, we analyze the average case of our algorithm according to equal probability assumption that the probability of one block overlapped or not overlapped with the other blocks is equal. Professor Chuan Yi Tang 唐傳義 2001 學位論文 ; thesis 92 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
博士 === 國立清華大學 === 資訊工程學系 === 89 === The problem of how to sort and select data efficiently has been widely discussed. Nowadays, to sort extremely large data is becoming more and more important for large corporations, banks, and government institutions, which rely on computers more and more deeply in all aspects. Most of the time, sorting and selection are accomplished by external sorting and selection algorithm, in which the data file is too large to fit into main memory and must be resided in the secondary memory.
We here present an optimal external sorting algorithm for two-level memory model. Our method is different from the traditional external merge sort and it uses the sampling information to reduce the disk I/Os in the external phase. The algorithm is elegant, simple and it makes a good use of memory available in the recent computer environment. Under the certain memory constraint, this algorithm runs with optimal number of disk I/Os and each record is exactly read twice and written twice.
This dissertation also presents an optimal sampling external selection algorithm to select k-th smallest item in large data sets for the two-level memory model. The sampling external selection algorithm is also applied to solve the worldwide selection problem in the Internet environment. The sampling information scheme is used to form an elegant and simple algorithm to reduce the number of disk I/Os. The best case and the worst case of our algorithm are discussed and our algorithm is also efficient for the multiple selections. Finally, we analyze the average case of our algorithm according to equal probability assumption that the probability of one block overlapped or not overlapped with the other blocks is equal.
|
author2 |
Professor Chuan Yi Tang |
author_facet |
Professor Chuan Yi Tang Fang Cheng Lu 呂芳誠 |
author |
Fang Cheng Lu 呂芳誠 |
spellingShingle |
Fang Cheng Lu 呂芳誠 On the study of external sorting and selection problems |
author_sort |
Fang Cheng Lu |
title |
On the study of external sorting and selection problems |
title_short |
On the study of external sorting and selection problems |
title_full |
On the study of external sorting and selection problems |
title_fullStr |
On the study of external sorting and selection problems |
title_full_unstemmed |
On the study of external sorting and selection problems |
title_sort |
on the study of external sorting and selection problems |
publishDate |
2001 |
url |
http://ndltd.ncl.edu.tw/handle/38789418437584207958 |
work_keys_str_mv |
AT fangchenglu onthestudyofexternalsortingandselectionproblems AT lǚfāngchéng onthestudyofexternalsortingandselectionproblems AT fangchenglu wàibùpáixùyǔwàibùsōuxúnwèntízhīyánjiū AT lǚfāngchéng wàibùpáixùyǔwàibùsōuxúnwèntízhīyánjiū |
_version_ |
1718334335293063168 |