Quasi-online accounting and monitoring system for distributed clouds
The HEP group at the University of Victoria operates a distributed cloud computing system for the ATLAS and Belle II experiments. The system uses private and commercial clouds in North America and Europe that run OpenStack, Open Nebula or commercial cloud software. It is critical that we record acco...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
EDP Sciences
2019-01-01
|
Series: | EPJ Web of Conferences |
Online Access: | https://www.epj-conferences.org/articles/epjconf/pdf/2019/19/epjconf_chep2018_07035.pdf |
id |
doaj-bcc078bd295f49baae81a79a91ac2806 |
---|---|
record_format |
Article |
spelling |
doaj-bcc078bd295f49baae81a79a91ac28062021-08-02T09:40:56ZengEDP SciencesEPJ Web of Conferences2100-014X2019-01-012140703510.1051/epjconf/201921407035epjconf_chep2018_07035Quasi-online accounting and monitoring system for distributed cloudsSeuster RolfBerghaus FrankCasteels KevinDriemel ColsonEbert MarcusLeavett-Brown ColinPaterson MichaelSobie RandallThe HEP group at the University of Victoria operates a distributed cloud computing system for the ATLAS and Belle II experiments. The system uses private and commercial clouds in North America and Europe that run OpenStack, Open Nebula or commercial cloud software. It is critical that we record accounting information to give credit to cloud owners and to verify our use of commercial resources. We want to record the number of CPU-hours of the virtual machine. We continuously collect the CPU usage and an estimate of the HEPSpec06 units of the VM obtained during the boot of the VM and uploads it into an Elastic Search database. The information is processed and published as soon as it is available. The data is published in tables and plots in Kibana and as a cross check in ROOT. We have found the system to be useful beyond gathering accounting information and can be used for monitoring and diagnostic purposes. For example, we can use it to detect if the payload jobs are stuck in a waiting state for external information. We will report on the design and performance of the system, and show how it provides important accounting and monitoring information on a large distributed system.https://www.epj-conferences.org/articles/epjconf/pdf/2019/19/epjconf_chep2018_07035.pdf |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Seuster Rolf Berghaus Frank Casteels Kevin Driemel Colson Ebert Marcus Leavett-Brown Colin Paterson Michael Sobie Randall |
spellingShingle |
Seuster Rolf Berghaus Frank Casteels Kevin Driemel Colson Ebert Marcus Leavett-Brown Colin Paterson Michael Sobie Randall Quasi-online accounting and monitoring system for distributed clouds EPJ Web of Conferences |
author_facet |
Seuster Rolf Berghaus Frank Casteels Kevin Driemel Colson Ebert Marcus Leavett-Brown Colin Paterson Michael Sobie Randall |
author_sort |
Seuster Rolf |
title |
Quasi-online accounting and monitoring system for distributed clouds |
title_short |
Quasi-online accounting and monitoring system for distributed clouds |
title_full |
Quasi-online accounting and monitoring system for distributed clouds |
title_fullStr |
Quasi-online accounting and monitoring system for distributed clouds |
title_full_unstemmed |
Quasi-online accounting and monitoring system for distributed clouds |
title_sort |
quasi-online accounting and monitoring system for distributed clouds |
publisher |
EDP Sciences |
series |
EPJ Web of Conferences |
issn |
2100-014X |
publishDate |
2019-01-01 |
description |
The HEP group at the University of Victoria operates a distributed
cloud computing system for the ATLAS and Belle II experiments. The system
uses private and commercial clouds in North America and Europe that run
OpenStack, Open Nebula or commercial cloud software. It is critical that we
record accounting information to give credit to cloud owners and to verify our use of commercial resources. We want to record the number of CPU-hours of the virtual machine. We continuously collect the CPU usage and an estimate of the HEPSpec06 units of the VM obtained during the boot of the VM and
uploads it into an Elastic Search database. The information is processed and published as soon as it is available. The data is published in tables and plots in Kibana and as a cross check in ROOT. We have found the system to be useful beyond gathering accounting information and can be used for monitoring and diagnostic purposes. For example, we can use it to detect if the payload jobs are stuck in a waiting state for external information. We will report on the design and performance of the system, and show how it provides important accounting and monitoring information on a large distributed system. |
url |
https://www.epj-conferences.org/articles/epjconf/pdf/2019/19/epjconf_chep2018_07035.pdf |
work_keys_str_mv |
AT seusterrolf quasionlineaccountingandmonitoringsystemfordistributedclouds AT berghausfrank quasionlineaccountingandmonitoringsystemfordistributedclouds AT casteelskevin quasionlineaccountingandmonitoringsystemfordistributedclouds AT driemelcolson quasionlineaccountingandmonitoringsystemfordistributedclouds AT ebertmarcus quasionlineaccountingandmonitoringsystemfordistributedclouds AT leavettbrowncolin quasionlineaccountingandmonitoringsystemfordistributedclouds AT patersonmichael quasionlineaccountingandmonitoringsystemfordistributedclouds AT sobierandall quasionlineaccountingandmonitoringsystemfordistributedclouds |
_version_ |
1721234704137453568 |