Operating system and network support for high-performance computing

High-performance computing applications were once limited to isolated supercomputers. In the past few years, however, there has been an increasing need to share data between different machines. This, combined with new network technologies which provide higher bandwidths, have led high-performance co...

Full description

Bibliographic Details
Main Author: Guedes Neto, Dorgival Olavo
Other Authors: Peterson, Larry
Language:en_US
Published: The University of Arizona. 1999
Subjects:
Online Access:http://hdl.handle.net/10150/298757
id ndltd-arizona.edu-oai-arizona.openrepository.com-10150-298757
record_format oai_dc
spelling ndltd-arizona.edu-oai-arizona.openrepository.com-10150-2987572015-10-23T05:21:53Z Operating system and network support for high-performance computing Guedes Neto, Dorgival Olavo Peterson, Larry Peterson, Larry L. Hartman, John H. Schlichting, Richard D. Computer Science. High-performance computing applications were once limited to isolated supercomputers. In the past few years, however, there has been an increasing need to share data between different machines. This, combined with new network technologies which provide higher bandwidths, have led high-performance computing systems to adapt so that they can move data over the local network. There are some problems in doing this. Current high-performance systems often use centralized protocol servers, thereby creating bottlenecks to network connections. In addition, the lack of a more appropriate protocol leads to the use of TCP by applications using parallel connections. TCP is not perfectly tuned to such applications. This dissertation presents a detailed analysis of the problems caused by centralized protocol servers and the use of TCP in high-performance computing environments. It shows why the network servers currently available in some supercomputers do not provide good performance. It also presents simulation results that illustrate how TCP connection performance can degrade rapidly when multiple cooperative connections are used. The main contributions in this work are the development of distributed protocol stacks and cooperative rate-based traffic shaping. Distributed stacks use an user-level protocol implementation to replicate the TCP/IP protocol stack in all the nodes of a multicomputer, removing the protocol server from the data path and avoiding the associated bottleneck. Cooperative rate shaping uses bandwidth estimates to pace data packets, avoiding most of the problems that cause performance degradation in parallel cooperative connections. It also provides a way for cooperating connections to share their bandwidth estimates, improving performance by making good use of their combined knowledge. 1999 text Dissertation-Reproduction (electronic) http://hdl.handle.net/10150/298757 9946820 .b3991558x en_US Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author. The University of Arizona.
collection NDLTD
language en_US
sources NDLTD
topic Computer Science.
spellingShingle Computer Science.
Guedes Neto, Dorgival Olavo
Operating system and network support for high-performance computing
description High-performance computing applications were once limited to isolated supercomputers. In the past few years, however, there has been an increasing need to share data between different machines. This, combined with new network technologies which provide higher bandwidths, have led high-performance computing systems to adapt so that they can move data over the local network. There are some problems in doing this. Current high-performance systems often use centralized protocol servers, thereby creating bottlenecks to network connections. In addition, the lack of a more appropriate protocol leads to the use of TCP by applications using parallel connections. TCP is not perfectly tuned to such applications. This dissertation presents a detailed analysis of the problems caused by centralized protocol servers and the use of TCP in high-performance computing environments. It shows why the network servers currently available in some supercomputers do not provide good performance. It also presents simulation results that illustrate how TCP connection performance can degrade rapidly when multiple cooperative connections are used. The main contributions in this work are the development of distributed protocol stacks and cooperative rate-based traffic shaping. Distributed stacks use an user-level protocol implementation to replicate the TCP/IP protocol stack in all the nodes of a multicomputer, removing the protocol server from the data path and avoiding the associated bottleneck. Cooperative rate shaping uses bandwidth estimates to pace data packets, avoiding most of the problems that cause performance degradation in parallel cooperative connections. It also provides a way for cooperating connections to share their bandwidth estimates, improving performance by making good use of their combined knowledge.
author2 Peterson, Larry
author_facet Peterson, Larry
Guedes Neto, Dorgival Olavo
author Guedes Neto, Dorgival Olavo
author_sort Guedes Neto, Dorgival Olavo
title Operating system and network support for high-performance computing
title_short Operating system and network support for high-performance computing
title_full Operating system and network support for high-performance computing
title_fullStr Operating system and network support for high-performance computing
title_full_unstemmed Operating system and network support for high-performance computing
title_sort operating system and network support for high-performance computing
publisher The University of Arizona.
publishDate 1999
url http://hdl.handle.net/10150/298757
work_keys_str_mv AT guedesnetodorgivalolavo operatingsystemandnetworksupportforhighperformancecomputing
_version_ 1718105475676897280