Summary: | Reinforcement learning is a robust artificial intelligence solution for agents required to act in an environment, making their own decisions on how to behave. Typically an agent is deployed alone with no prior knowledge, but if given sufficient time, a suitable state representation and an informative reward function is guaranteed to learn how to maximise its long term reward. Incorporating domain knowledge, typically known by the system designer, can minimise the number of suboptimal behaviours tried and, therefore, speed up the rate of learning. Potential-based reward shaping is a method of providing this knowledge to an agent by additional rewards. Furthermore, if the agent is alone in the environment, it is guaranteed to learn the same behaviour both with and without potential-based reward shaping. Meanwhile, there has also been a growing interest in deploying not just one agent but many into the same environment. This application can benefit from the potential of both multi-agent systems and reinforcement learning. However, practical use is often limited by the non-stationary environment, exponential increase in state features with every agent added and partial observability. This thesis documents work combining knowledge-based reinforcement learning and multi-agent reinforcement learning so that the latter can be achieved quicker and, therefore, feasibly applied to complex problem domains. Experience gained from many empirical studies is gathered to support novel theoretical contributions proving that the pre-existing guarantees of potential-based reward shaping do not apply when used in multi-agent problem domains. Instead multi-agent potential-based reward shaping may cause agents to learn a different behaviour, but this behaviour is guaranteed to be from the same set of behaviours that the agents could have learned without the additional rewards. Therefore, knowledge-based multi-agent reinforcement learning can both reduce the time a group of agents need to learn a suitable behaviour and increase their final performance.
|