Stick-breaking policy learning in Dec-POMDPs
Expectation maximization (EM) has recently been shown to be an efficient algorithm for learning finite-state controllers (FSCs) in large decentralized POMDPs (Dec-POMDPs). However, current methods use fixed-size FSCs and often converge to maxima that are far from the optimal value. This paper repres...
Main Authors: | , , , , |
---|---|
Other Authors: | , |
Format: | Article |
Language: | English |
Published: |
International Joint Conferences on Artificial Intelligence, Inc.,
2016-10-21T19:07:30Z.
|
Subjects: | |
Online Access: | Get fulltext |