Cyclical processor and computer architectures for highly parallel applications

During the last few decades, the search for powerful computing machines has been one of the several endless pursuits among the scientific community. In this thesis, several novel architectural ideas for the designs of high-performance computing machines are presented, and the practicality and useful...

Full description

Bibliographic Details
Main Author: Wong, Fut-Suan
Language:English
Published: University of British Columbia 2010
Online Access:http://hdl.handle.net/2429/25646
id ndltd-UBC-oai-circle.library.ubc.ca-2429-25646
record_format oai_dc
collection NDLTD
language English
sources NDLTD
description During the last few decades, the search for powerful computing machines has been one of the several endless pursuits among the scientific community. In this thesis, several novel architectural ideas for the designs of high-performance computing machines are presented, and the practicality and usefulness of cyclical architectures -- ones which have their hardware resources cyclically arranged -- in this respect are examined. These ideas are illustrated with the use of specific application examples including parallel sorting, packet-switched communications and the design methodology of a class of next-generation computers. In the first part of our studies, the structure and control algorithms of a single-chip, recirculating systolic sorter (RSS), are presented. The correctness of the algorithms is proved, and general operational constraints are derived. This parallel sorter is highly amenable to VLSI implementations because of the simple control structure and the regular, repetitive and near-neighbour type of interconnections required. The number of quadruple comparators needed to sort N items is N/4, and the average sorting time is found to be bounded by (logN)**2 and N. A hardware termination is incorporated into the control unit of the sorter, so that the sorting process can be terminated as soon as the input list is in the desired order. In the second part of our studies, a novel loop-structured switching network (LSSN) is presented. It is intended for packet communications in large-scale systems consisting of hundreds to thousands of interconnected devices. With L loops -- where L is a power of two, it can connect up to N=L(logL) pairs of transmitters and receivers, using only N/2 two-by-two switches; in terms of switch counts and the amounts of wiring, this network is very advantageous when the value of N is large. It can be extended incrementally, and is free of the store-and-forward type of deadlocks which prevail in other cyclical, packet-switched networks. Our simulation results show that its average throughput rate and delay are close to that of other designs despite its relatively low switch count. In the third part of our studies, a new design methodology for the next-generation computers is described. Our proposed system, the Event-Driven Computer (EDC) is primarily a data-driven system which has its computing resources arranged as a circular pipeline, and it is supplemented with control-driven activities. Such a combined approach is aimed at extracting the advantages of both the "pure" data-driven and control-driven computations while alleviating their shortcomings. Compared to other designs, an EDC has the merits of a simpler architecture, better resource utilization, array processing capabilities and a higher speed range. As is shown by our studies, the properties of the cyclical architectures depend greatly on how the information packets interact with each other; deadlocks, for instance, will occur on systems such as the Loop-Structured Switching Network because of the asynchronous, circular requests of network resources by the packets (we have, however, presented a deadlock avoidance scheme), but will not occur on synchronous systems such as the Recirculating Systolic Sorter. In general, the resource utilization of the cyclical architectures are higher than that of the acyclic ones — or equivalently, the cyclical architectures can handle larger amounts of information with relatively smaller areas -- they are therefore more suitable to the designs of very large-scale systems. === Applied Science, Faculty of === Electrical and Computer Engineering, Department of === Graduate
author Wong, Fut-Suan
spellingShingle Wong, Fut-Suan
Cyclical processor and computer architectures for highly parallel applications
author_facet Wong, Fut-Suan
author_sort Wong, Fut-Suan
title Cyclical processor and computer architectures for highly parallel applications
title_short Cyclical processor and computer architectures for highly parallel applications
title_full Cyclical processor and computer architectures for highly parallel applications
title_fullStr Cyclical processor and computer architectures for highly parallel applications
title_full_unstemmed Cyclical processor and computer architectures for highly parallel applications
title_sort cyclical processor and computer architectures for highly parallel applications
publisher University of British Columbia
publishDate 2010
url http://hdl.handle.net/2429/25646
work_keys_str_mv AT wongfutsuan cyclicalprocessorandcomputerarchitecturesforhighlyparallelapplications
_version_ 1718592865429356544
spelling ndltd-UBC-oai-circle.library.ubc.ca-2429-256462018-01-05T17:43:15Z Cyclical processor and computer architectures for highly parallel applications Wong, Fut-Suan During the last few decades, the search for powerful computing machines has been one of the several endless pursuits among the scientific community. In this thesis, several novel architectural ideas for the designs of high-performance computing machines are presented, and the practicality and usefulness of cyclical architectures -- ones which have their hardware resources cyclically arranged -- in this respect are examined. These ideas are illustrated with the use of specific application examples including parallel sorting, packet-switched communications and the design methodology of a class of next-generation computers. In the first part of our studies, the structure and control algorithms of a single-chip, recirculating systolic sorter (RSS), are presented. The correctness of the algorithms is proved, and general operational constraints are derived. This parallel sorter is highly amenable to VLSI implementations because of the simple control structure and the regular, repetitive and near-neighbour type of interconnections required. The number of quadruple comparators needed to sort N items is N/4, and the average sorting time is found to be bounded by (logN)**2 and N. A hardware termination is incorporated into the control unit of the sorter, so that the sorting process can be terminated as soon as the input list is in the desired order. In the second part of our studies, a novel loop-structured switching network (LSSN) is presented. It is intended for packet communications in large-scale systems consisting of hundreds to thousands of interconnected devices. With L loops -- where L is a power of two, it can connect up to N=L(logL) pairs of transmitters and receivers, using only N/2 two-by-two switches; in terms of switch counts and the amounts of wiring, this network is very advantageous when the value of N is large. It can be extended incrementally, and is free of the store-and-forward type of deadlocks which prevail in other cyclical, packet-switched networks. Our simulation results show that its average throughput rate and delay are close to that of other designs despite its relatively low switch count. In the third part of our studies, a new design methodology for the next-generation computers is described. Our proposed system, the Event-Driven Computer (EDC) is primarily a data-driven system which has its computing resources arranged as a circular pipeline, and it is supplemented with control-driven activities. Such a combined approach is aimed at extracting the advantages of both the "pure" data-driven and control-driven computations while alleviating their shortcomings. Compared to other designs, an EDC has the merits of a simpler architecture, better resource utilization, array processing capabilities and a higher speed range. As is shown by our studies, the properties of the cyclical architectures depend greatly on how the information packets interact with each other; deadlocks, for instance, will occur on systems such as the Loop-Structured Switching Network because of the asynchronous, circular requests of network resources by the packets (we have, however, presented a deadlock avoidance scheme), but will not occur on synchronous systems such as the Recirculating Systolic Sorter. In general, the resource utilization of the cyclical architectures are higher than that of the acyclic ones — or equivalently, the cyclical architectures can handle larger amounts of information with relatively smaller areas -- they are therefore more suitable to the designs of very large-scale systems. Applied Science, Faculty of Electrical and Computer Engineering, Department of Graduate 2010-06-13T16:39:49Z 2010-06-13T16:39:49Z 1984 Text Thesis/Dissertation http://hdl.handle.net/2429/25646 eng For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. University of British Columbia