Abstract

Info

Title:: Deep deterministic policy gradient based cooperativeplatoon longitudinal control strategy

Author(s):: MIN Haigen1; 2; YANG Yiming1; WANG Wuqi1; FANG Yukun1; SONG Xiaopeng3; (1. School of Information & Engineering, Changan University, Xian 710064, Shaanxi, China;2. Joint Laboratory for Internet of Vehicles, Ministry of EducationChina MobileCommunications Corporation, Changan University, Xian 710064, Shaanxi, China;3. Zhejiang Transportation Planning and Design Institute Co., Ltd, Hangzhou 310017, Zhejiang, China)

Keywords:: traffic engineering; deep reinforcement learning; platoon longitudinal control; deep deterministic policy gradient; platoon string stability

Abstract:: To solve the problem of continuous and accurate platoon control and string stability during platoon traveling, a deep reinforcement learning (DRL)based platoon longitudinal control strategy at moderate speed was proposed. Three key factors including spacing, vehicle speed and acceleration, were fully considered and satisfied by the proposed strategy, which considers vehicle dynamics and comfort in the learning process. First, the platoon control process was modeled and the algorithm of the reinforcement learning was illustrated. Second, a DRLbased method that determines the optimal strategy for platoon longitudinal control was proposed. Particularly, a multiobjective reward function was designed, which can integrate the rewards corresponding to the distance error, speed error, and acceleration constraints. Third, the deep deterministic policy gradient (DDPG) was adopted to solve the platoon longitudinal control problem. The algorithm combined actorcritic (AC) and deep Qnetwork (DQN) to effectively solve the problem of platoon control in continuous state space and continuous action space. The results show that the proposed platoon control method based on reinforcement learning has the same control accuracy as the distributed model predictive control algorithm, and can achieve the string stability of a platoon under the leaderfollower communication topology. 3 tabs, 11 figs, 19 refs.