Quasi-online accounting and monitoring system for distributed clouds

The HEP group at the University of Victoria operates a distributed cloud computing system for the ATLAS and Belle II experiments. The system uses private and commercial clouds in North America and Europe that run OpenStack, Open Nebula or commercial cloud software. It is critical that we record acco...

Full description

Bibliographic Details
Main Authors: Seuster Rolf, Berghaus Frank, Casteels Kevin, Driemel Colson, Ebert Marcus, Leavett-Brown Colin, Paterson Michael, Sobie Randall
Format: Article
Language:English
Published: EDP Sciences 2019-01-01
Series:EPJ Web of Conferences
Online Access:https://www.epj-conferences.org/articles/epjconf/pdf/2019/19/epjconf_chep2018_07035.pdf
id doaj-bcc078bd295f49baae81a79a91ac2806
record_format Article
spelling doaj-bcc078bd295f49baae81a79a91ac28062021-08-02T09:40:56ZengEDP SciencesEPJ Web of Conferences2100-014X2019-01-012140703510.1051/epjconf/201921407035epjconf_chep2018_07035Quasi-online accounting and monitoring system for distributed cloudsSeuster RolfBerghaus FrankCasteels KevinDriemel ColsonEbert MarcusLeavett-Brown ColinPaterson MichaelSobie RandallThe HEP group at the University of Victoria operates a distributed cloud computing system for the ATLAS and Belle II experiments. The system uses private and commercial clouds in North America and Europe that run OpenStack, Open Nebula or commercial cloud software. It is critical that we record accounting information to give credit to cloud owners and to verify our use of commercial resources. We want to record the number of CPU-hours of the virtual machine. We continuously collect the CPU usage and an estimate of the HEPSpec06 units of the VM obtained during the boot of the VM and uploads it into an Elastic Search database. The information is processed and published as soon as it is available. The data is published in tables and plots in Kibana and as a cross check in ROOT. We have found the system to be useful beyond gathering accounting information and can be used for monitoring and diagnostic purposes. For example, we can use it to detect if the payload jobs are stuck in a waiting state for external information. We will report on the design and performance of the system, and show how it provides important accounting and monitoring information on a large distributed system.https://www.epj-conferences.org/articles/epjconf/pdf/2019/19/epjconf_chep2018_07035.pdf
collection DOAJ
language English
format Article
sources DOAJ
author Seuster Rolf
Berghaus Frank
Casteels Kevin
Driemel Colson
Ebert Marcus
Leavett-Brown Colin
Paterson Michael
Sobie Randall
spellingShingle Seuster Rolf
Berghaus Frank
Casteels Kevin
Driemel Colson
Ebert Marcus
Leavett-Brown Colin
Paterson Michael
Sobie Randall
Quasi-online accounting and monitoring system for distributed clouds
EPJ Web of Conferences
author_facet Seuster Rolf
Berghaus Frank
Casteels Kevin
Driemel Colson
Ebert Marcus
Leavett-Brown Colin
Paterson Michael
Sobie Randall
author_sort Seuster Rolf
title Quasi-online accounting and monitoring system for distributed clouds
title_short Quasi-online accounting and monitoring system for distributed clouds
title_full Quasi-online accounting and monitoring system for distributed clouds
title_fullStr Quasi-online accounting and monitoring system for distributed clouds
title_full_unstemmed Quasi-online accounting and monitoring system for distributed clouds
title_sort quasi-online accounting and monitoring system for distributed clouds
publisher EDP Sciences
series EPJ Web of Conferences
issn 2100-014X
publishDate 2019-01-01
description The HEP group at the University of Victoria operates a distributed cloud computing system for the ATLAS and Belle II experiments. The system uses private and commercial clouds in North America and Europe that run OpenStack, Open Nebula or commercial cloud software. It is critical that we record accounting information to give credit to cloud owners and to verify our use of commercial resources. We want to record the number of CPU-hours of the virtual machine. We continuously collect the CPU usage and an estimate of the HEPSpec06 units of the VM obtained during the boot of the VM and uploads it into an Elastic Search database. The information is processed and published as soon as it is available. The data is published in tables and plots in Kibana and as a cross check in ROOT. We have found the system to be useful beyond gathering accounting information and can be used for monitoring and diagnostic purposes. For example, we can use it to detect if the payload jobs are stuck in a waiting state for external information. We will report on the design and performance of the system, and show how it provides important accounting and monitoring information on a large distributed system.
url https://www.epj-conferences.org/articles/epjconf/pdf/2019/19/epjconf_chep2018_07035.pdf
work_keys_str_mv AT seusterrolf quasionlineaccountingandmonitoringsystemfordistributedclouds
AT berghausfrank quasionlineaccountingandmonitoringsystemfordistributedclouds
AT casteelskevin quasionlineaccountingandmonitoringsystemfordistributedclouds
AT driemelcolson quasionlineaccountingandmonitoringsystemfordistributedclouds
AT ebertmarcus quasionlineaccountingandmonitoringsystemfordistributedclouds
AT leavettbrowncolin quasionlineaccountingandmonitoringsystemfordistributedclouds
AT patersonmichael quasionlineaccountingandmonitoringsystemfordistributedclouds
AT sobierandall quasionlineaccountingandmonitoringsystemfordistributedclouds
_version_ 1721234704137453568