On the study of external sorting and selection problems

博士 === 國立清華大學 === 資訊工程學系 === 89 === The problem of how to sort and select data efficiently has been widely discussed. Nowadays, to sort extremely large data is becoming more and more important for large corporations, banks, and government institutions, which rely on computers more and mor...

Full description

Bibliographic Details
Main Authors: Fang Cheng Lu, 呂芳誠
Other Authors: Professor Chuan Yi Tang
Format: Others
Language:zh-TW
Published: 2001
Online Access:http://ndltd.ncl.edu.tw/handle/38789418437584207958
id ndltd-TW-089NTHU0392006
record_format oai_dc
spelling ndltd-TW-089NTHU03920062016-07-04T04:17:18Z http://ndltd.ncl.edu.tw/handle/38789418437584207958 On the study of external sorting and selection problems 外部排序與外部搜尋問題之研究 Fang Cheng Lu 呂芳誠 博士 國立清華大學 資訊工程學系 89 The problem of how to sort and select data efficiently has been widely discussed. Nowadays, to sort extremely large data is becoming more and more important for large corporations, banks, and government institutions, which rely on computers more and more deeply in all aspects. Most of the time, sorting and selection are accomplished by external sorting and selection algorithm, in which the data file is too large to fit into main memory and must be resided in the secondary memory. We here present an optimal external sorting algorithm for two-level memory model. Our method is different from the traditional external merge sort and it uses the sampling information to reduce the disk I/Os in the external phase. The algorithm is elegant, simple and it makes a good use of memory available in the recent computer environment. Under the certain memory constraint, this algorithm runs with optimal number of disk I/Os and each record is exactly read twice and written twice. This dissertation also presents an optimal sampling external selection algorithm to select k-th smallest item in large data sets for the two-level memory model. The sampling external selection algorithm is also applied to solve the worldwide selection problem in the Internet environment. The sampling information scheme is used to form an elegant and simple algorithm to reduce the number of disk I/Os. The best case and the worst case of our algorithm are discussed and our algorithm is also efficient for the multiple selections. Finally, we analyze the average case of our algorithm according to equal probability assumption that the probability of one block overlapped or not overlapped with the other blocks is equal. Professor Chuan Yi Tang 唐傳義 2001 學位論文 ; thesis 92 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 博士 === 國立清華大學 === 資訊工程學系 === 89 === The problem of how to sort and select data efficiently has been widely discussed. Nowadays, to sort extremely large data is becoming more and more important for large corporations, banks, and government institutions, which rely on computers more and more deeply in all aspects. Most of the time, sorting and selection are accomplished by external sorting and selection algorithm, in which the data file is too large to fit into main memory and must be resided in the secondary memory. We here present an optimal external sorting algorithm for two-level memory model. Our method is different from the traditional external merge sort and it uses the sampling information to reduce the disk I/Os in the external phase. The algorithm is elegant, simple and it makes a good use of memory available in the recent computer environment. Under the certain memory constraint, this algorithm runs with optimal number of disk I/Os and each record is exactly read twice and written twice. This dissertation also presents an optimal sampling external selection algorithm to select k-th smallest item in large data sets for the two-level memory model. The sampling external selection algorithm is also applied to solve the worldwide selection problem in the Internet environment. The sampling information scheme is used to form an elegant and simple algorithm to reduce the number of disk I/Os. The best case and the worst case of our algorithm are discussed and our algorithm is also efficient for the multiple selections. Finally, we analyze the average case of our algorithm according to equal probability assumption that the probability of one block overlapped or not overlapped with the other blocks is equal.
author2 Professor Chuan Yi Tang
author_facet Professor Chuan Yi Tang
Fang Cheng Lu
呂芳誠
author Fang Cheng Lu
呂芳誠
spellingShingle Fang Cheng Lu
呂芳誠
On the study of external sorting and selection problems
author_sort Fang Cheng Lu
title On the study of external sorting and selection problems
title_short On the study of external sorting and selection problems
title_full On the study of external sorting and selection problems
title_fullStr On the study of external sorting and selection problems
title_full_unstemmed On the study of external sorting and selection problems
title_sort on the study of external sorting and selection problems
publishDate 2001
url http://ndltd.ncl.edu.tw/handle/38789418437584207958
work_keys_str_mv AT fangchenglu onthestudyofexternalsortingandselectionproblems
AT lǚfāngchéng onthestudyofexternalsortingandselectionproblems
AT fangchenglu wàibùpáixùyǔwàibùsōuxúnwèntízhīyánjiū
AT lǚfāngchéng wàibùpáixùyǔwàibùsōuxúnwèntízhīyánjiū
_version_ 1718334335293063168