Deep Reinforcement Learning for Adaptive Resource Allocation in Virtualized Network Functions

Network Function Virtualization (NFV) is the transition from proprietary hardware functions to virtualized counterparts of them within the telecommunication industry. These virtualized counterparts are known as Virtualized Network Functions (VNFs) and are the main building blocks of NFV. The transit...

Full description

Bibliographic Details
Main Author:	Ignat, Simon
Format:	Others
Language:	English
Published:	KTH, Skolan för elektroteknik och datavetenskap (EECS) 2018
Subjects:	Computer and Information Sciences Data- och informationsvetenskap
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-236100

id	ndltd-UPSALLA1-oai-DiVA.org-kth-236100
record_format	oai_dc
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Computer and Information Sciences Data- och informationsvetenskap
spellingShingle	Computer and Information Sciences Data- och informationsvetenskap Ignat, Simon Deep Reinforcement Learning for Adaptive Resource Allocation in Virtualized Network Functions
description	Network Function Virtualization (NFV) is the transition from proprietary hardware functions to virtualized counterparts of them within the telecommunication industry. These virtualized counterparts are known as Virtualized Network Functions (VNFs) and are the main building blocks of NFV. The transition started 2012 and is still ongoing, with research and development moving at a high pace. It is believed that when using virtualization, both capital and operating expenses can be lowered as a result of easier deployments, cheaper systems and networks that can operate more autonomous. This thesis examines if the current state of NFV can lower the operating expenses while maintaining quality of service (QoS) high by using current state of the art machine learning algorithms. More specifically the thesis analyzes the problem of adaptive autoscaling of virtual machines (VMs) allocated by the VNFs with deep reinforcement learning (DRL). To analyze the task, the thesis implements a discrete time model for VNFs with the purpose of capturing the fundamental characteristics of the scaling operation. It also examines the learning and robustness/generalization of six state-of-the-art DRL algorithms. The algorithms are examined since they have fundamental differences in their properties, ranging from off-policy methods such as DQN to on-policy methods such as PPO Advantage Actor Critic. The policies are compared to a baseline P-controller to evaluate the performance with respect to simpler methods. The result from the model show that DRL needs around 100,000 samples to converge, which in a real setting would represent around 70 days of learning. The thesis also shows that the final policy applied by the agent does not show considerable improvements over a simple control algorithm with respect to reward and performance when multiple experiments with varying loads and configurations are tested. Due to the lack of data and slow real time systems, with robustness being an important consideration, the time to convergence requiredby a DRL agent is to long for an autoscaling solution to be deployed in the near future. Therefore, the author can not recommend DRL for autoscaling in VNFs given the current state of the technology. Instead the author recommend simpler methods, such as supervised machinelearning or classical control theory. === Network Function Virtualization (NFV) är övergången från proprietära hårdvarufunktioner till virtualiserade motsvarigheter av dem inom telekommunikationsindustrin. Dessa virtualiserade motsvarigheter är kända som Virtualized Network Functions (VNF) och kan ses som beståndsdelarna av NFV. Tankar om virtualisering startade 2012 och är fortfarande pågående, där forskning och utveckling fortskrider i snabb takt. Förhoppningen är att virtualiseringen ska sänka både kapital och driftkostnader till följd av enklare installationer, billigare system och mer autonoma lösningar. Det här examensarbetet undersöker om NFV:s nuvarande tillstånd kan sänka driftskostnaderna samtidigt som kvaliteten på tjänsten (QoS) hålls hög genom att använda maskininlärning. Mer specifikt undersöks Deep Reinforcment Learning (DRL) och problemet adaptiv autoskalning av virtuella maskiner som används av VNF:erna. För att analysera uppgiften implementerar examensarbetet en diskret model över VNF:s med syftet att fånga de fundamentala egenskaperna hos skalningsoperationer. Det granskar också lärandet och robustheten av sex DRL-algoritmer. Algoritmerna undersöks eftersom de har grundläggande skillnader i deras egenskaper, från off-policy-metoder så som DQN till on-policy-metoder såsom PPO Advantage Actor Critic. Algoritmerna jämförs sedan med en P-regulator för att utvärdera prestanda med hänsyn till enklare metoder. Resultatet från studien visar att DRL behöver cirka 100 000 interaktioner med modellen för att konvergera, vilket i en verklig miljö skulle motsvara cirka 70 dagars lärande. Examensarbetet visar också att de konvergerade algoritmerna inte visar avsevärda förbättringar över den enkla P-regulatorn när flera experiment med varierande belastningar och konfigurationer testas. På grund av bristen på data och det långsamma realtidssystem, där robusthet är ett viktigt övervägande, ses tiden för konvergens som krävs av en DRL-agent som ett stort problem. Därför kan författaren inte rekommendera DRL för autoskalning i NFV med tanke på teknikens nuvarande tillstånd. Istället rekommenderar författaren enklare metoder, såsom supervised machinelearning eller klassisk kontrollteori.
author	Ignat, Simon
author_facet	Ignat, Simon
author_sort	Ignat, Simon
title	Deep Reinforcement Learning for Adaptive Resource Allocation in Virtualized Network Functions
title_short	Deep Reinforcement Learning for Adaptive Resource Allocation in Virtualized Network Functions
title_full	Deep Reinforcement Learning for Adaptive Resource Allocation in Virtualized Network Functions
title_fullStr	Deep Reinforcement Learning for Adaptive Resource Allocation in Virtualized Network Functions
title_full_unstemmed	Deep Reinforcement Learning for Adaptive Resource Allocation in Virtualized Network Functions
title_sort	deep reinforcement learning for adaptive resource allocation in virtualized network functions
publisher	KTH, Skolan för elektroteknik och datavetenskap (EECS)
publishDate	2018
url	http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-236100
work_keys_str_mv	AT ignatsimon deepreinforcementlearningforadaptiveresourceallocationinvirtualizednetworkfunctions
_version_	1718773971691765760
spelling	ndltd-UPSALLA1-oai-DiVA.org-kth-2361002018-10-17T06:03:15ZDeep Reinforcement Learning for Adaptive Resource Allocation in Virtualized Network FunctionsengIgnat, SimonKTH, Skolan för elektroteknik och datavetenskap (EECS)2018Computer and Information SciencesData- och informationsvetenskapNetwork Function Virtualization (NFV) is the transition from proprietary hardware functions to virtualized counterparts of them within the telecommunication industry. These virtualized counterparts are known as Virtualized Network Functions (VNFs) and are the main building blocks of NFV. The transition started 2012 and is still ongoing, with research and development moving at a high pace. It is believed that when using virtualization, both capital and operating expenses can be lowered as a result of easier deployments, cheaper systems and networks that can operate more autonomous. This thesis examines if the current state of NFV can lower the operating expenses while maintaining quality of service (QoS) high by using current state of the art machine learning algorithms. More specifically the thesis analyzes the problem of adaptive autoscaling of virtual machines (VMs) allocated by the VNFs with deep reinforcement learning (DRL). To analyze the task, the thesis implements a discrete time model for VNFs with the purpose of capturing the fundamental characteristics of the scaling operation. It also examines the learning and robustness/generalization of six state-of-the-art DRL algorithms. The algorithms are examined since they have fundamental differences in their properties, ranging from off-policy methods such as DQN to on-policy methods such as PPO Advantage Actor Critic. The policies are compared to a baseline P-controller to evaluate the performance with respect to simpler methods. The result from the model show that DRL needs around 100,000 samples to converge, which in a real setting would represent around 70 days of learning. The thesis also shows that the final policy applied by the agent does not show considerable improvements over a simple control algorithm with respect to reward and performance when multiple experiments with varying loads and configurations are tested. Due to the lack of data and slow real time systems, with robustness being an important consideration, the time to convergence requiredby a DRL agent is to long for an autoscaling solution to be deployed in the near future. Therefore, the author can not recommend DRL for autoscaling in VNFs given the current state of the technology. Instead the author recommend simpler methods, such as supervised machinelearning or classical control theory. Network Function Virtualization (NFV) är övergången från proprietära hårdvarufunktioner till virtualiserade motsvarigheter av dem inom telekommunikationsindustrin. Dessa virtualiserade motsvarigheter är kända som Virtualized Network Functions (VNF) och kan ses som beståndsdelarna av NFV. Tankar om virtualisering startade 2012 och är fortfarande pågående, där forskning och utveckling fortskrider i snabb takt. Förhoppningen är att virtualiseringen ska sänka både kapital och driftkostnader till följd av enklare installationer, billigare system och mer autonoma lösningar. Det här examensarbetet undersöker om NFV:s nuvarande tillstånd kan sänka driftskostnaderna samtidigt som kvaliteten på tjänsten (QoS) hålls hög genom att använda maskininlärning. Mer specifikt undersöks Deep Reinforcment Learning (DRL) och problemet adaptiv autoskalning av virtuella maskiner som används av VNF:erna. För att analysera uppgiften implementerar examensarbetet en diskret model över VNF:s med syftet att fånga de fundamentala egenskaperna hos skalningsoperationer. Det granskar också lärandet och robustheten av sex DRL-algoritmer. Algoritmerna undersöks eftersom de har grundläggande skillnader i deras egenskaper, från off-policy-metoder så som DQN till on-policy-metoder såsom PPO Advantage Actor Critic. Algoritmerna jämförs sedan med en P-regulator för att utvärdera prestanda med hänsyn till enklare metoder. Resultatet från studien visar att DRL behöver cirka 100 000 interaktioner med modellen för att konvergera, vilket i en verklig miljö skulle motsvara cirka 70 dagars lärande. Examensarbetet visar också att de konvergerade algoritmerna inte visar avsevärda förbättringar över den enkla P-regulatorn när flera experiment med varierande belastningar och konfigurationer testas. På grund av bristen på data och det långsamma realtidssystem, där robusthet är ett viktigt övervägande, ses tiden för konvergens som krävs av en DRL-agent som ett stort problem. Därför kan författaren inte rekommendera DRL för autoskalning i NFV med tanke på teknikens nuvarande tillstånd. Istället rekommenderar författaren enklare metoder, såsom supervised machinelearning eller klassisk kontrollteori. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-236100TRITA-EECS-EX ; 2018:622application/pdfinfo:eu-repo/semantics/openAccess

Deep Reinforcement Learning for Adaptive Resource Allocation in Virtualized Network Functions

Similar Items