Enforcing the consensus between Trajectory Optimization and Policy Learning for precise robot control

Quentin Le Lidec; Wilson Jallet; Ivan Laptev; Cordelia Schmid; Justin Carpentier

Conference Papers Year : 2023

Enforcing the consensus between Trajectory Optimization and Policy Learning for precise robot control

(1) , (1, 2) , (1) , (1) , (1)

1
2

Quentin Le Lidec

Function : Author
PersonId : 1083016

Models of visual object recognition and scene understanding

Wilson Jallet

Function : Author
PersonId : 749065
IdHAL : wilson-jallet
ORCID : 0000-0001-8222-2739

Models of visual object recognition and scene understanding

Équipe Mouvement des Systèmes Anthropomorphes

Ivan Laptev

Function : Author

Models of visual object recognition and scene understanding

Cordelia Schmid

Function : Author

Models of visual object recognition and scene understanding

Justin Carpentier

Function : Author
PersonId : 3401
IdHAL : justin-carpentier
ORCID : 0000-0001-6585-2894
IdRef : 233948015

Models of visual object recognition and scene understanding

Abstract

Reinforcement learning (RL) and trajectory optimization (TO) present strong complementary advantages. On one hand, RL approaches are able to learn global control policies directly from data, but generally require large sample sizes to properly converge towards feasible policies. On the other hand, TO methods are able to exploit gradient-based information extracted from simulators to quickly converge towards a locally optimal control trajectory which is only valid within the vicinity of the solution. Over the past decade, several approaches have aimed to adequately combine the two classes of methods in order to obtain the best of both worlds. Following on from this line of research, we propose several improvements on top of these approaches to learn global control policies quicker, notably by leveraging sensitivity information stemming from TO methods via Sobolev learning, and augmented Lagrangian techniques to enforce the consensus between TO and policy learning. We evaluate the benefits of these improvements on various classical tasks in robotics through comparison with existing approaches in the literature.

Domains

Robotics [cs.RO] Machine Learning [cs.LG]

Fichier principal

lelidec2022enforcing.pdf (925.21 Ko)

Origin : Files produced by the author(s)

Quentin Le Lidec : Connect in order to contact the contributor

https://hal.science/hal-03780392

Submitted on : Thursday, February 16, 2023-3:50:46 PM

Last modification on : Friday, April 19, 2024-4:18:58 PM

Dates and versions

hal-03780392 , version 1 (19-09-2022)

hal-03780392 , version 2 (20-01-2023)

hal-03780392 , version 3 (16-02-2023)

Identifiers

HAL Id : hal-03780392 , version 3

Cite

Quentin Le Lidec, Wilson Jallet, Ivan Laptev, Cordelia Schmid, Justin Carpentier. Enforcing the consensus between Trajectory Optimization and Policy Learning for precise robot control. ICRA 2023 - IEEE International Conference on Robotics and Automation, May 2023, London, United Kingdom. ⟨hal-03780392v3⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS UNIV-TLSE2 CNRS INRIA INSA-TOULOUSE LAAS LAAS-GEPETTO UT1-CAPITOLE LAAS-ROBOTIQUE INRIA2 GENCI PSL INSA-GROUPE ANR PRAIRIE-IA TOULOUSE-INP UNIV-UT3 UT3-TOULOUSEINP AGIMUS

277 View

153 Download

Enforcing the consensus between Trajectory Optimization and Policy Learning for precise robot control

Abstract

Domains

Dates and versions

Identifiers

Cite

Export

Collections

Share