A study of the exploration/exploitation trade-off in reinforcement learning : Applied to autonomous driving

A world initiative was set in motion for decreasing the amount of traffic accidents. Autonomous driving is a field which contributes to the initiative. Following report examines exploration/exploitationtrade-off in reinforcement learning applied to decision making in autonomous driving. The approach...

Full description

Bibliographic Details
Main Authors:	Louis, Ruwaid, Yu, David
Format:	Others
Language:	English
Published:	KTH, Skolan för elektroteknik och datavetenskap (EECS) 2019
Subjects:	Computer and Information Sciences Data- och informationsvetenskap
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254938

id	ndltd-UPSALLA1-oai-DiVA.org-kth-254938
record_format	oai_dc
spelling	ndltd-UPSALLA1-oai-DiVA.org-kth-2549382019-07-30T04:29:11ZA study of the exploration/exploitation trade-off in reinforcement learning : Applied to autonomous drivingengEn studie om utforskning/utnyttjande avvägningen inom förstärkande inlärning : Applicerat på autonoma fordonLouis, RuwaidYu, DavidKTH, Skolan för elektroteknik och datavetenskap (EECS)KTH, Skolan för elektroteknik och datavetenskap (EECS)2019Computer and Information SciencesData- och informationsvetenskapA world initiative was set in motion for decreasing the amount of traffic accidents. Autonomous driving is a field which contributes to the initiative. Following report examines exploration/exploitationtrade-off in reinforcement learning applied to decision making in autonomous driving. The approach consisted of modelling the problemas a Markov Decision Process which was solved with the Q-learning. Decision making utilized exploration greed approach. Scenarios consisted of different kinds of intersections, and was built using SUMO. The ego vehicle was controlled using TraCI. Goal was to discuss thetrade-off from two perspectives - time and safety, measured in numberof collision among other things - in the domain of autonomous driving. Furthermore, exploration prompted ego vehicle to pass the scenarios in less time. This lead to increased collisions, and thus decreased safety. In contrast, exploitation preferred deacceleration and stopping which resulted in increased safety but increased the passage time and traffic. Conclusion was to exploit previous experiences when applying reinforcement learning to decision making in autonomous driving because safety is the highest priority when it comes to autonomous driving and the world initiative. Ett globalt initiativ startades för att reducera antalet trafikolyckor innan år 2030. Autonoma fordon är ett forskningsområde som bidrar till det globala initiativet. I denna rapport undersöks avvägningen mellan utforskning och utnyttjande inom förstärkningsinlärande för beslutsfattande processen inom autonoma fordon. Tillvägagångssättet bestod av att modellera problemet som Markov Beslutsprocess som löstes med hjälp av Q-learning. Beslutsfattande processen tillvaratog en utnyttjande inställning. Scenario bestod av olika typer av korsningar, och de programmerades med hjälp av SUMO. Autonoma fordonet kontrollerades med hjälp av TraCI. Målet var att diskutera avvägningen från två perspektiv tid och säkerhet, mät i antalet kollisioner bl.a inom forskningsområdet autonoma fordon. Resultat visade att utforskning uppmanade autonoma fordonet att passera scenarion under kortare tid. Detta ledde till ökade antal kollisioner och därmed minskad säkerhet. Å andra sidan, ökad utnyttjande föredrog inbromsning vilket resulterade i ökad antalet lyckade passeringar. Detta leder till ökad säkerhet men ökar också passeringstiden och mängden trafik. Slutsatsen var att man ska föredra utnyttjande av tidigare erfarenheter när man tillämpar förstärkningsinlärande på beslutsfattandeprocessen inom autonoma fordon. Slutsatsen befattades därför att säkerhet har högst prioritet inom autonoma fordon och det globala initiativet. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254938TRITA-EECS-EX ; 2019:319application/pdfinfo:eu-repo/semantics/openAccess
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Computer and Information Sciences Data- och informationsvetenskap
spellingShingle	Computer and Information Sciences Data- och informationsvetenskap Louis, Ruwaid Yu, David A study of the exploration/exploitation trade-off in reinforcement learning : Applied to autonomous driving
description	A world initiative was set in motion for decreasing the amount of traffic accidents. Autonomous driving is a field which contributes to the initiative. Following report examines exploration/exploitationtrade-off in reinforcement learning applied to decision making in autonomous driving. The approach consisted of modelling the problemas a Markov Decision Process which was solved with the Q-learning. Decision making utilized exploration greed approach. Scenarios consisted of different kinds of intersections, and was built using SUMO. The ego vehicle was controlled using TraCI. Goal was to discuss thetrade-off from two perspectives - time and safety, measured in numberof collision among other things - in the domain of autonomous driving. Furthermore, exploration prompted ego vehicle to pass the scenarios in less time. This lead to increased collisions, and thus decreased safety. In contrast, exploitation preferred deacceleration and stopping which resulted in increased safety but increased the passage time and traffic. Conclusion was to exploit previous experiences when applying reinforcement learning to decision making in autonomous driving because safety is the highest priority when it comes to autonomous driving and the world initiative. === Ett globalt initiativ startades för att reducera antalet trafikolyckor innan år 2030. Autonoma fordon är ett forskningsområde som bidrar till det globala initiativet. I denna rapport undersöks avvägningen mellan utforskning och utnyttjande inom förstärkningsinlärande för beslutsfattande processen inom autonoma fordon. Tillvägagångssättet bestod av att modellera problemet som Markov Beslutsprocess som löstes med hjälp av Q-learning. Beslutsfattande processen tillvaratog en utnyttjande inställning. Scenario bestod av olika typer av korsningar, och de programmerades med hjälp av SUMO. Autonoma fordonet kontrollerades med hjälp av TraCI. Målet var att diskutera avvägningen från två perspektiv tid och säkerhet, mät i antalet kollisioner bl.a inom forskningsområdet autonoma fordon. Resultat visade att utforskning uppmanade autonoma fordonet att passera scenarion under kortare tid. Detta ledde till ökade antal kollisioner och därmed minskad säkerhet. Å andra sidan, ökad utnyttjande föredrog inbromsning vilket resulterade i ökad antalet lyckade passeringar. Detta leder till ökad säkerhet men ökar också passeringstiden och mängden trafik. Slutsatsen var att man ska föredra utnyttjande av tidigare erfarenheter när man tillämpar förstärkningsinlärande på beslutsfattandeprocessen inom autonoma fordon. Slutsatsen befattades därför att säkerhet har högst prioritet inom autonoma fordon och det globala initiativet.
author	Louis, Ruwaid Yu, David
author_facet	Louis, Ruwaid Yu, David
author_sort	Louis, Ruwaid
title	A study of the exploration/exploitation trade-off in reinforcement learning : Applied to autonomous driving
title_short	A study of the exploration/exploitation trade-off in reinforcement learning : Applied to autonomous driving
title_full	A study of the exploration/exploitation trade-off in reinforcement learning : Applied to autonomous driving
title_fullStr	A study of the exploration/exploitation trade-off in reinforcement learning : Applied to autonomous driving
title_full_unstemmed	A study of the exploration/exploitation trade-off in reinforcement learning : Applied to autonomous driving
title_sort	study of the exploration/exploitation trade-off in reinforcement learning : applied to autonomous driving
publisher	KTH, Skolan för elektroteknik och datavetenskap (EECS)
publishDate	2019
url	http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254938
work_keys_str_mv	AT louisruwaid astudyoftheexplorationexploitationtradeoffinreinforcementlearningappliedtoautonomousdriving AT yudavid astudyoftheexplorationexploitationtradeoffinreinforcementlearningappliedtoautonomousdriving AT louisruwaid enstudieomutforskningutnyttjandeavvagningeninomforstarkandeinlarningappliceratpaautonomafordon AT yudavid enstudieomutforskningutnyttjandeavvagningeninomforstarkandeinlarningappliceratpaautonomafordon AT louisruwaid studyoftheexplorationexploitationtradeoffinreinforcementlearningappliedtoautonomousdriving AT yudavid studyoftheexplorationexploitationtradeoffinreinforcementlearningappliedtoautonomousdriving
_version_	1719231622337789952

A study of the exploration/exploitation trade-off in reinforcement learning : Applied to autonomous driving

Similar Items