Integration of model predictive control and reinforcement learning for dynamic systems with application to robot manipulators

Date

2024

Authors

Hu, Pengcheng

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The last decade has witnessed great progress in the development of reinforcement learning (RL) across many applications, such as games and autonomous driving. RL is effective in solving control problems for complex systems whose dynamics are intractable to be accurately modeled. In an RL algorithm, the agent learns the optimal policy in terms of the maximum reward based on measurement samples from the interactions with the environment. To obtain the optimal policy, RL requires collecting sufficiently large number of samples, which is challenging in real-world applications, e.g., robotics, manufacturing, and so on. To tackle this problem, model predictive control-based RL (MPC-based RL) is proposed to improve the sample efficiency. In the MPC-based RL algorithm, a model is learned from collected samples, the learned model and MPC are utilized to predict trajectories over a specified prediction horizon, and an action is obtained through the RL algorithm by maximizing the cumulative reward. This thesis is devoted to the investigation of the MPC-based RL design and its application to robot manipulators. In Chapter 2, an MPC-based deep RL framework for constrained linear systems with bounded disturbances is proposed. In the proposed framework, a rigid tube-based MPC (RTMPC) method is employed to predict a trajectory by solving the corresponding optimization problem. Then, the predicted trajectory is stored in a replay buffer as the form of data pairs. Further, the soft actor-critic (SAC) algorithm is applied to modify the loss function and update the policy online, based on the predicted data pairs. Numerical simulations validate the effectiveness of the proposed method. In addition, comparison results demonstrate the advantages of the proposed method including requirement of fewer real samples and providing better control performance with comparable computational complexity to RTMPC. In Chapter 3, we investigate the application of three methods for manipulators. Firstly, we apply an MPC-based RL algorithm, a nonlinear MPC (NMPC) method, and two model-free RL algorithms to tackle the regulation problem for a 2-degree-of-freedom manipulator system, and compare their training control performance. Secondly, the training and control performance evaluation for the model-free RL algorithm and the MPC-based RL algorithm are provided. The MPC-based RL algorithm shows better training performance in terms of sample efficiency and total return but poorer control performance. Thirdly, simulation studies are provided to compare the training performance of the MPC-based RL algorithm and two model-free RL algorithms. From the simulation results, the MPC-based RL algorithm presents poorer training performance compared with model-free RL algorithms for the twelve-dimensional system. In Chapter 4, conclusions and future work are summarized.

Description

Keywords

Model predictive control, Reinforcement learning, Robot manipulator, Model predictive control-based reinforcement learning

Citation