Learning to Plan with Logical Automata

This paper introduces the Logic-based Value Iteration Network (LVIN) framework, which combines imitation learning and logical automata to enable agents to learn complex behaviors from demonstrations. We address two problems with learning from expert knowledge: (1) how to generalize learned policies...

Full description

Bibliographic Details
Main Authors:	Araki, Brandon (Author), Vodrahalli, Kiran (Author), Leech, Thomas (Author), Vasile, Cristian-Ioan (Author), Donahue, Mark D. (Author), Rus, Daniela L (Author)
Other Authors:	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory (Contributor), Lincoln Laboratory (Contributor)
Format:	Article
Language:	English
Published:	Robotics: Science and Systems Foundation, 2019-12-19T20:42:45Z.
Subjects:	Article
Online Access:	Get fulltext


LEADER	02217 am a22002533u 4500
001	123310
042			\|a dc
100	1	0	\|a Araki, Brandon \|e author
100	1	0	\|a Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory \|e contributor
100	1	0	\|a Lincoln Laboratory \|e contributor
700	1	0	\|a Vodrahalli, Kiran \|e author
700	1	0	\|a Leech, Thomas \|e author
700	1	0	\|a Vasile, Cristian-Ioan \|e author
700	1	0	\|a Donahue, Mark D. \|e author
700	1	0	\|a Rus, Daniela L \|e author
245	0	0	\|a Learning to Plan with Logical Automata
260			\|b Robotics: Science and Systems Foundation, \|c 2019-12-19T20:42:45Z.
856			\|z Get fulltext \|u https://hdl.handle.net/1721.1/123310
520			\|a This paper introduces the Logic-based Value Iteration Network (LVIN) framework, which combines imitation learning and logical automata to enable agents to learn complex behaviors from demonstrations. We address two problems with learning from expert knowledge: (1) how to generalize learned policies for a task to larger classes of tasks, and (2) how to account for erroneous demonstrations. Our LVIN model solves finite gridworld environments by instantiating a recurrent, convolutional neural network as a value iteration procedure over a learned Markov Decision Process (MDP) that factors into two MDPs: a small finite state automaton (FSA) corresponding to logical rules, and a larger MDP corresponding to motions in the environment. The parameters of LVIN (value function, reward map, FSA transitions, large MDP transitions) are approximately learned from expert trajectories. Since the model represents the learned rules as an FSA, the model is interpretable; since the FSA is integrated into planning, the behavior of the agent can be manipulated by modifying the FSA transitions. We demonstrate these abilities in several domains of interest, including a lunchboxpacking manipulation task and a driving domain.
520			\|a National Science Foundation (Grant 1723943)
520			\|a United States. Office of Naval Research (Grant N000141812830)
520			\|a Air Force Office of Scientific Research (Contract FA8702-15-D-0001)
655	7		\|a Article
773			\|t Robotics: Science and Systems 2019

Learning to Plan with Logical Automata

Similar Items