Comparing a gang-like scheduler with the default Kubernetes scheduler in a multi-tenant serverless distributed deep learning training environment

Systems for running distributed deep learning training on the cloud have recently been developed. An important component of a distributed deep learning job handler is its resource allocation scheduler. This scheduler allocates computing resources to parts of a distributed training architecture. In t...

Full description

Bibliographic Details
Main Author: Lövenvald, Frans-Lukas
Format: Others
Language:English
Published: Umeå universitet, Institutionen för datavetenskap 2021
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-189688