Operating system and network support for high-performance computing
High-performance computing applications were once limited to isolated supercomputers. In the past few years, however, there has been an increasing need to share data between different machines. This, combined with new network technologies which provide higher bandwidths, have led high-performance co...
Main Author: | |
---|---|
Other Authors: | |
Language: | en_US |
Published: |
The University of Arizona.
1999
|
Subjects: | |
Online Access: | http://hdl.handle.net/10150/298757 |
id |
ndltd-arizona.edu-oai-arizona.openrepository.com-10150-298757 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-arizona.edu-oai-arizona.openrepository.com-10150-2987572015-10-23T05:21:53Z Operating system and network support for high-performance computing Guedes Neto, Dorgival Olavo Peterson, Larry Peterson, Larry L. Hartman, John H. Schlichting, Richard D. Computer Science. High-performance computing applications were once limited to isolated supercomputers. In the past few years, however, there has been an increasing need to share data between different machines. This, combined with new network technologies which provide higher bandwidths, have led high-performance computing systems to adapt so that they can move data over the local network. There are some problems in doing this. Current high-performance systems often use centralized protocol servers, thereby creating bottlenecks to network connections. In addition, the lack of a more appropriate protocol leads to the use of TCP by applications using parallel connections. TCP is not perfectly tuned to such applications. This dissertation presents a detailed analysis of the problems caused by centralized protocol servers and the use of TCP in high-performance computing environments. It shows why the network servers currently available in some supercomputers do not provide good performance. It also presents simulation results that illustrate how TCP connection performance can degrade rapidly when multiple cooperative connections are used. The main contributions in this work are the development of distributed protocol stacks and cooperative rate-based traffic shaping. Distributed stacks use an user-level protocol implementation to replicate the TCP/IP protocol stack in all the nodes of a multicomputer, removing the protocol server from the data path and avoiding the associated bottleneck. Cooperative rate shaping uses bandwidth estimates to pace data packets, avoiding most of the problems that cause performance degradation in parallel cooperative connections. It also provides a way for cooperating connections to share their bandwidth estimates, improving performance by making good use of their combined knowledge. 1999 text Dissertation-Reproduction (electronic) http://hdl.handle.net/10150/298757 9946820 .b3991558x en_US Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author. The University of Arizona. |
collection |
NDLTD |
language |
en_US |
sources |
NDLTD |
topic |
Computer Science. |
spellingShingle |
Computer Science. Guedes Neto, Dorgival Olavo Operating system and network support for high-performance computing |
description |
High-performance computing applications were once limited to isolated supercomputers. In the past few years, however, there has been an increasing need to share data between different machines. This, combined with new network technologies which provide higher bandwidths, have led high-performance computing systems to adapt so that they can move data over the local network. There are some problems in doing this. Current high-performance systems often use centralized protocol servers, thereby creating bottlenecks to network connections. In addition, the lack of a more appropriate protocol leads to the use of TCP by applications using parallel connections. TCP is not perfectly tuned to such applications. This dissertation presents a detailed analysis of the problems caused by centralized protocol servers and the use of TCP in high-performance computing environments. It shows why the network servers currently available in some supercomputers do not provide good performance. It also presents simulation results that illustrate how TCP connection performance can degrade rapidly when multiple cooperative connections are used. The main contributions in this work are the development of distributed protocol stacks and cooperative rate-based traffic shaping. Distributed stacks use an user-level protocol implementation to replicate the TCP/IP protocol stack in all the nodes of a multicomputer, removing the protocol server from the data path and avoiding the associated bottleneck. Cooperative rate shaping uses bandwidth estimates to pace data packets, avoiding most of the problems that cause performance degradation in parallel cooperative connections. It also provides a way for cooperating connections to share their bandwidth estimates, improving performance by making good use of their combined knowledge. |
author2 |
Peterson, Larry |
author_facet |
Peterson, Larry Guedes Neto, Dorgival Olavo |
author |
Guedes Neto, Dorgival Olavo |
author_sort |
Guedes Neto, Dorgival Olavo |
title |
Operating system and network support for high-performance computing |
title_short |
Operating system and network support for high-performance computing |
title_full |
Operating system and network support for high-performance computing |
title_fullStr |
Operating system and network support for high-performance computing |
title_full_unstemmed |
Operating system and network support for high-performance computing |
title_sort |
operating system and network support for high-performance computing |
publisher |
The University of Arizona. |
publishDate |
1999 |
url |
http://hdl.handle.net/10150/298757 |
work_keys_str_mv |
AT guedesnetodorgivalolavo operatingsystemandnetworksupportforhighperformancecomputing |
_version_ |
1718105475676897280 |