Energy-Based Continuous Inverse Optimal Control

The problem of continuous inverse optimal control (over finite time horizon) is to learn the unknown cost function over the sequence of continuous control variables from expert demonstrations. In this article, we study this fundamental problem in the framework of energy-based model (EBM), where the...

Full description

Bibliographic Details
Main Authors:	Baker, C. (Author), Wu, Y.N (Author), Xie, J. (Author), Xu, Y. (Author), Zhao, T. (Author), Zhao, Y. (Author)
Format:	Article
Language:	English
Published:	Institute of Electrical and Electronics Engineers Inc. 2022
Subjects:	Approximation algorithms Autonomous vehicles Autonomous Vehicles Cooperative learning Cost benefit analysis Cost function Cost functions Cost-function Costs Energy-based model Energy-based models energy-based models (EBMs) Generator Generators Heuristic algorithms Heuristics algorithm Inverse optimal control inverse optimal control (IOC) Inverse problems Inverse-optimal control Langevin dynamic. Langevin dynamics Langevin dynamics. Maximum likelihood estimation Maximum-likelihood estimation Optimal control Optimal control systems Optimal controls Probability density function Trajectories Trajectory
Online Access:	View Fulltext in Publisher


LEADER	03458nam a2200601Ia 4500
001	10.1109-TNNLS.2022.3168795
008	220630s2022 CNT 000 0 und d
020			\|a 2162237X (ISSN)
245	1	0	\|a Energy-Based Continuous Inverse Optimal Control
260		0	\|b Institute of Electrical and Electronics Engineers Inc. \|c 2022
520	3		\|a The problem of continuous inverse optimal control (over finite time horizon) is to learn the unknown cost function over the sequence of continuous control variables from expert demonstrations. In this article, we study this fundamental problem in the framework of energy-based model (EBM), where the observed expert trajectories are assumed to be random samples from a probability density function defined as the exponential of the negative cost function up to a normalizing constant. The parameters of the cost function are learned by maximum likelihood via an ``analysis by synthesis'' scheme, which iterates: 1) synthesis step: sample the synthesized trajectories from the current probability density using the Langevin dynamics via backpropagation through time and 2) analysis step: update the model parameters based on the statistical difference between the synthesized trajectories and the observed trajectories. Given the fact that an efficient optimization algorithm is usually available for an optimal control problem, we also consider a convenient approximation of the above learning method, where we replace the sampling in the synthesis step by optimization. Moreover, to make the sampling or optimization more efficient, we propose to train the EBM simultaneously with a top-down trajectory generator via cooperative learning, where the trajectory generator is used to fast initialize the synthesis step of the EBM. We demonstrate the proposed methods on autonomous driving tasks and show that they can learn suitable cost functions for optimal control. IEEE
650	0	4	\|a Approximation algorithms
650	0	4	\|a Autonomous vehicles
650	0	4	\|a Autonomous Vehicles
650	0	4	\|a Cooperative learning
650	0	4	\|a Cooperative learning
650	0	4	\|a Cost benefit analysis
650	0	4	\|a Cost function
650	0	4	\|a Cost functions
650	0	4	\|a Cost-function
650	0	4	\|a Costs
650	0	4	\|a Energy-based model
650	0	4	\|a Energy-based models
650	0	4	\|a energy-based models (EBMs)
650	0	4	\|a Generator
650	0	4	\|a Generators
650	0	4	\|a Heuristic algorithms
650	0	4	\|a Heuristic algorithms
650	0	4	\|a Heuristics algorithm
650	0	4	\|a Inverse optimal control
650	0	4	\|a inverse optimal control (IOC)
650	0	4	\|a Inverse problems
650	0	4	\|a Inverse-optimal control
650	0	4	\|a Langevin dynamic.
650	0	4	\|a Langevin dynamics
650	0	4	\|a Langevin dynamics.
650	0	4	\|a Maximum likelihood estimation
650	0	4	\|a Maximum likelihood estimation
650	0	4	\|a Maximum-likelihood estimation
650	0	4	\|a Optimal control
650	0	4	\|a Optimal control systems
650	0	4	\|a Optimal controls
650	0	4	\|a Probability density function
650	0	4	\|a Trajectories
650	0	4	\|a Trajectory
700	1	0	\|a Baker, C. \|e author
700	1	0	\|a Wu, Y.N. \|e author
700	1	0	\|a Xie, J. \|e author
700	1	0	\|a Xu, Y. \|e author
700	1	0	\|a Zhao, T. \|e author
700	1	0	\|a Zhao, Y. \|e author
773			\|t IEEE Transactions on Neural Networks and Learning Systems
856			\|z View Fulltext in Publisher \|u https://doi.org/10.1109/TNNLS.2022.3168795

Energy-Based Continuous Inverse Optimal Control

Similar Items