Summary: | Reinforcement learning (RL) encounters the increasing challenge of maintaining good performance in emerging large-scale real-world problems. Function approximation is the key technique to solve the performance degradation issues when implementing RL algorithms in problems with continuous and/or large environments. In such problem domains, the number of state-action values necessary to be stored and the time to fully explore the task environment can be dramatically large,
significantly impeding the RL agent's progress to solve hard problems with high performance. Function approximation techniques handle this challenge by representing the state-acton values with a limited number of parametric components while in the meanwhile obtaining generalization ability and shortening convergence time. However, when solving real-world problems with very complex environments, current function approximation algorithms cannot guarantee satisfactory performance. In this
dissertation, we develop new function approximation techniques and apply them to two difficult real-world problems: the TCP congestion control and the video streaming bitrate adaptation. We show that applying reinforcement learning using an uncompressed table, or even a parameterized table with existing function approximation techniques to store learned state-action values, can give poor performance in these continuous and large-scale domains. To solve the performance degradation
issues, we study the architecture of Sparse Distributed Memories (SDMs, also called Kanerva coding) and extend it by designing new function approximators with significantly improved performance in terms of effectiveness, efficiency and adaptability. We describe three novel online function approximators, each of which has its own strengths and suitable applications. We evaluate their performance on classic testbeds: the Mountain Car and the Hunter-Prey problems. We then show that they
are able to solve the TCP congestion control and video streaming bitrate adaptation problems with significant performance improvements compared to state-of-the-art techniques.
|