Meta-optimization in safe reinforcement learning: Enhancing safety at training and deployment with fewer hyperparameters

Honari, Homayoun

Meta-optimization in safe reinforcement learning: Enhancing safety at training and deployment with fewer hyperparameters

dc.contributor.author	Honari, Homayoun
dc.contributor.supervisor	Najjaran, Homayoun
dc.date.accessioned	2024-09-25T18:36:31Z
dc.date.available	2024-09-25T18:36:31Z
dc.date.issued	2024
dc.degree.department	Department of Mechanical Engineering
dc.degree.level	Master of Applied Science MASc
dc.description.abstract	Reinforcement learning (RL) is a trial-and-error framework for enabling intelligent systems to learn the optimal behaviour based on the feedback from the environment. In recent years, successful application of RL in controlling various embodied systems have been observed. However, the real-world deployment and training of RL methods require paying attention to certain limitations imposed by the robot and its surroundings. To address these limitations, safe RL algorithms aim to define safety constraints based on the physics of the system and modify the training regime of the RL methods to satisfy them during training and inference. While safe RL offers a promising direction for achieving real-world deployability, challenges such as sample efficiency and hyperparameter tuning hinders its applicability in real-world scenarios. To address these challenges, this thesis proposes various approaches. First, a metagradient-based training pipeline called Meta Soft Actor-Critic Lagrangian (Meta SAC-Lag) is proposed which aims to optimize the aforementioned safety-related hyperparameters under the conventional Lagrangian framework. To study the performance, the proposed method is evaluated in various safety-critical simulated environments. In addition, a real-world task is designed, and the algorithm is successfully deployed on a Kinova Gen3 robotic arm to showcase its real-world deployability with minimal hyperparameter tuning requirements. Furthermore, a multi-objective policy optimization framework is proposed which specifies the trade-off between optimality and safety directly and optimizes both of them simultaneously. The competitive performance of the proposed algorithm compared to the state-of-the-art safe RL methods with fewer hyperparameters showcases its potential in providing a powerful alternative framework for safe RL.
dc.description.scholarlevel	Graduate
dc.identifier.uri	https://hdl.handle.net/1828/20452
dc.language	English	eng
dc.language.iso	en
dc.rights	Available to the World Wide Web
dc.title	Meta-optimization in safe reinforcement learning: Enhancing safety at training and deployment with fewer hyperparameters
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Honari_Homayoun_MASc_2024.pdf
Size:: 4.53 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.62 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Electronic Theses and Dissertations (ETD)
Theses (Mechanical Engineering)