HPC cloud system design and implementation

There is pronounced interest to cloud computing in the scientific community. However, current cloud computing offerings are rarely suitable for highperformance computing, in large part due to an overhead level of underlying virtualization components. The purpose of this paper is to propose a design...

Full description

Bibliographic Details
Main Authors: A. O. Kudryavtsev, V. K. Koshelev, A. O. Izbyshev, I. A. Dudina, Sh. F. Kurmangaleev, A. I. Avetisyan, V. P. Ivannikov, V. E. Velikhov, E. A. Ryabinkin
Format: Article
Language:English
Published: Ivannikov Institute for System Programming of the Russian Academy of Sciences 2018-10-01
Series:Труды Института системного программирования РАН
Subjects:
Online Access:https://ispranproceedings.elpub.ru/jour/article/view/948
id doaj-6c8b34250a3f4ef48018ab545f830f7a
record_format Article
spelling doaj-6c8b34250a3f4ef48018ab545f830f7a2020-11-25T02:54:57Zeng Ivannikov Institute for System Programming of the Russian Academy of SciencesТруды Института системного программирования РАН2079-81562220-64262018-10-01240948HPC cloud system design and implementationA. O. Kudryavtsev0V. K. Koshelev1A. O. Izbyshev2I. A. Dudina3Sh. F. Kurmangaleev4A. I. Avetisyan5V. P. Ivannikov6V. E. Velikhov7E. A. Ryabinkin8ИСП РАНИСП РАНИСП РАНИСП РАНИСП РАНИСП РАНИСП РАНИСП РАНИСП РАНThere is pronounced interest to cloud computing in the scientific community. However, current cloud computing offerings are rarely suitable for highperformance computing, in large part due to an overhead level of underlying virtualization components. The purpose of this paper is to propose a design and implementation of a cloud system that possesses a small enough overhead level to allow it to be practically used for a wide range of scientific workloads. First, we describe requirements for the desired system and classify workloads to identify those that are practical to transfer to the cloud. Then, we review related work. Finally, we describe our cloud system, "Virtual Supercomputer", which is based on the OpenStack cloud infrastructure and KVM/QEMU hypervisor. Most components of the original infrastructure were modified to satisfy the requirements. In particular, we tuned KVM/QEMU and the host operating system, introduced the concept of virtual machine groups and implemented a topology-aware scheduler to reduce communication overhead between network nodes belonging to the same virtual machine group. Also, we implemented a proof-of-concept web service on top of our system that allows to use OpenFOAM toolbox in software-as-a-service manner. The main result of our work is that "Virtual Supercomputer" achieved the overhead level of less than 10% on industry standard benchmarks when using up to 1024 processor cores. We deem this overhead level as acceptable for practical use.https://ispranproceedings.elpub.ru/jour/article/view/948облачные вычислениявиртуализациямониторы виртуальных машинвысокопроизводительные вычисленияпараллельные вычисления
collection DOAJ
language English
format Article
sources DOAJ
author A. O. Kudryavtsev
V. K. Koshelev
A. O. Izbyshev
I. A. Dudina
Sh. F. Kurmangaleev
A. I. Avetisyan
V. P. Ivannikov
V. E. Velikhov
E. A. Ryabinkin
spellingShingle A. O. Kudryavtsev
V. K. Koshelev
A. O. Izbyshev
I. A. Dudina
Sh. F. Kurmangaleev
A. I. Avetisyan
V. P. Ivannikov
V. E. Velikhov
E. A. Ryabinkin
HPC cloud system design and implementation
Труды Института системного программирования РАН
облачные вычисления
виртуализация
мониторы виртуальных машин
высокопроизводительные вычисления
параллельные вычисления
author_facet A. O. Kudryavtsev
V. K. Koshelev
A. O. Izbyshev
I. A. Dudina
Sh. F. Kurmangaleev
A. I. Avetisyan
V. P. Ivannikov
V. E. Velikhov
E. A. Ryabinkin
author_sort A. O. Kudryavtsev
title HPC cloud system design and implementation
title_short HPC cloud system design and implementation
title_full HPC cloud system design and implementation
title_fullStr HPC cloud system design and implementation
title_full_unstemmed HPC cloud system design and implementation
title_sort hpc cloud system design and implementation
publisher Ivannikov Institute for System Programming of the Russian Academy of Sciences
series Труды Института системного программирования РАН
issn 2079-8156
2220-6426
publishDate 2018-10-01
description There is pronounced interest to cloud computing in the scientific community. However, current cloud computing offerings are rarely suitable for highperformance computing, in large part due to an overhead level of underlying virtualization components. The purpose of this paper is to propose a design and implementation of a cloud system that possesses a small enough overhead level to allow it to be practically used for a wide range of scientific workloads. First, we describe requirements for the desired system and classify workloads to identify those that are practical to transfer to the cloud. Then, we review related work. Finally, we describe our cloud system, "Virtual Supercomputer", which is based on the OpenStack cloud infrastructure and KVM/QEMU hypervisor. Most components of the original infrastructure were modified to satisfy the requirements. In particular, we tuned KVM/QEMU and the host operating system, introduced the concept of virtual machine groups and implemented a topology-aware scheduler to reduce communication overhead between network nodes belonging to the same virtual machine group. Also, we implemented a proof-of-concept web service on top of our system that allows to use OpenFOAM toolbox in software-as-a-service manner. The main result of our work is that "Virtual Supercomputer" achieved the overhead level of less than 10% on industry standard benchmarks when using up to 1024 processor cores. We deem this overhead level as acceptable for practical use.
topic облачные вычисления
виртуализация
мониторы виртуальных машин
высокопроизводительные вычисления
параллельные вычисления
url https://ispranproceedings.elpub.ru/jour/article/view/948
work_keys_str_mv AT aokudryavtsev hpccloudsystemdesignandimplementation
AT vkkoshelev hpccloudsystemdesignandimplementation
AT aoizbyshev hpccloudsystemdesignandimplementation
AT iadudina hpccloudsystemdesignandimplementation
AT shfkurmangaleev hpccloudsystemdesignandimplementation
AT aiavetisyan hpccloudsystemdesignandimplementation
AT vpivannikov hpccloudsystemdesignandimplementation
AT vevelikhov hpccloudsystemdesignandimplementation
AT earyabinkin hpccloudsystemdesignandimplementation
_version_ 1724718790904643584