HPC cloud system design and implementation
There is pronounced interest to cloud computing in the scientific community. However, current cloud computing offerings are rarely suitable for highperformance computing, in large part due to an overhead level of underlying virtualization components. The purpose of this paper is to propose a design...
Main Authors: | , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Ivannikov Institute for System Programming of the Russian Academy of Sciences
2018-10-01
|
Series: | Труды Института системного программирования РАН |
Subjects: | |
Online Access: | https://ispranproceedings.elpub.ru/jour/article/view/948 |
id |
doaj-6c8b34250a3f4ef48018ab545f830f7a |
---|---|
record_format |
Article |
spelling |
doaj-6c8b34250a3f4ef48018ab545f830f7a2020-11-25T02:54:57Zeng Ivannikov Institute for System Programming of the Russian Academy of SciencesТруды Института системного программирования РАН2079-81562220-64262018-10-01240948HPC cloud system design and implementationA. O. Kudryavtsev0V. K. Koshelev1A. O. Izbyshev2I. A. Dudina3Sh. F. Kurmangaleev4A. I. Avetisyan5V. P. Ivannikov6V. E. Velikhov7E. A. Ryabinkin8ИСП РАНИСП РАНИСП РАНИСП РАНИСП РАНИСП РАНИСП РАНИСП РАНИСП РАНThere is pronounced interest to cloud computing in the scientific community. However, current cloud computing offerings are rarely suitable for highperformance computing, in large part due to an overhead level of underlying virtualization components. The purpose of this paper is to propose a design and implementation of a cloud system that possesses a small enough overhead level to allow it to be practically used for a wide range of scientific workloads. First, we describe requirements for the desired system and classify workloads to identify those that are practical to transfer to the cloud. Then, we review related work. Finally, we describe our cloud system, "Virtual Supercomputer", which is based on the OpenStack cloud infrastructure and KVM/QEMU hypervisor. Most components of the original infrastructure were modified to satisfy the requirements. In particular, we tuned KVM/QEMU and the host operating system, introduced the concept of virtual machine groups and implemented a topology-aware scheduler to reduce communication overhead between network nodes belonging to the same virtual machine group. Also, we implemented a proof-of-concept web service on top of our system that allows to use OpenFOAM toolbox in software-as-a-service manner. The main result of our work is that "Virtual Supercomputer" achieved the overhead level of less than 10% on industry standard benchmarks when using up to 1024 processor cores. We deem this overhead level as acceptable for practical use.https://ispranproceedings.elpub.ru/jour/article/view/948облачные вычислениявиртуализациямониторы виртуальных машинвысокопроизводительные вычисленияпараллельные вычисления |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
A. O. Kudryavtsev V. K. Koshelev A. O. Izbyshev I. A. Dudina Sh. F. Kurmangaleev A. I. Avetisyan V. P. Ivannikov V. E. Velikhov E. A. Ryabinkin |
spellingShingle |
A. O. Kudryavtsev V. K. Koshelev A. O. Izbyshev I. A. Dudina Sh. F. Kurmangaleev A. I. Avetisyan V. P. Ivannikov V. E. Velikhov E. A. Ryabinkin HPC cloud system design and implementation Труды Института системного программирования РАН облачные вычисления виртуализация мониторы виртуальных машин высокопроизводительные вычисления параллельные вычисления |
author_facet |
A. O. Kudryavtsev V. K. Koshelev A. O. Izbyshev I. A. Dudina Sh. F. Kurmangaleev A. I. Avetisyan V. P. Ivannikov V. E. Velikhov E. A. Ryabinkin |
author_sort |
A. O. Kudryavtsev |
title |
HPC cloud system design and implementation |
title_short |
HPC cloud system design and implementation |
title_full |
HPC cloud system design and implementation |
title_fullStr |
HPC cloud system design and implementation |
title_full_unstemmed |
HPC cloud system design and implementation |
title_sort |
hpc cloud system design and implementation |
publisher |
Ivannikov Institute for System Programming of the Russian Academy of Sciences |
series |
Труды Института системного программирования РАН |
issn |
2079-8156 2220-6426 |
publishDate |
2018-10-01 |
description |
There is pronounced interest to cloud computing in the scientific community. However, current cloud computing offerings are rarely suitable for highperformance computing, in large part due to an overhead level of underlying virtualization components. The purpose of this paper is to propose a design and implementation of a cloud system that possesses a small enough overhead level to allow it to be practically used for a wide range of scientific workloads. First, we describe requirements for the desired system and classify workloads to identify those that are practical to transfer to the cloud. Then, we review related work. Finally, we describe our cloud system, "Virtual Supercomputer", which is based on the OpenStack cloud infrastructure and KVM/QEMU hypervisor. Most components of the original infrastructure were modified to satisfy the requirements. In particular, we tuned KVM/QEMU and the host operating system, introduced the concept of virtual machine groups and implemented a topology-aware scheduler to reduce communication overhead between network nodes belonging to the same virtual machine group. Also, we implemented a proof-of-concept web service on top of our system that allows to use OpenFOAM toolbox in software-as-a-service manner. The main result of our work is that "Virtual Supercomputer" achieved the overhead level of less than 10% on industry standard benchmarks when using up to 1024 processor cores. We deem this overhead level as acceptable for practical use. |
topic |
облачные вычисления виртуализация мониторы виртуальных машин высокопроизводительные вычисления параллельные вычисления |
url |
https://ispranproceedings.elpub.ru/jour/article/view/948 |
work_keys_str_mv |
AT aokudryavtsev hpccloudsystemdesignandimplementation AT vkkoshelev hpccloudsystemdesignandimplementation AT aoizbyshev hpccloudsystemdesignandimplementation AT iadudina hpccloudsystemdesignandimplementation AT shfkurmangaleev hpccloudsystemdesignandimplementation AT aiavetisyan hpccloudsystemdesignandimplementation AT vpivannikov hpccloudsystemdesignandimplementation AT vevelikhov hpccloudsystemdesignandimplementation AT earyabinkin hpccloudsystemdesignandimplementation |
_version_ |
1724718790904643584 |