Comparing a gang-like scheduler with the default Kubernetes scheduler in a multi-tenant serverless distributed deep learning training environment
Systems for running distributed deep learning training on the cloud have recently been developed. An important component of a distributed deep learning job handler is its resource allocation scheduler. This scheduler allocates computing resources to parts of a distributed training architecture. In t...
Main Author: | |
---|---|
Format: | Others |
Language: | English |
Published: |
Umeå universitet, Institutionen för datavetenskap
2021
|
Subjects: | |
Online Access: | http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-189688 |