[1]闵海根,杨一鸣,王武祺,等.基于深度确定性策略梯度的队列纵向协同控制策略[J].长安大学学报(自然科学版),2021,41(4):90-100.
 MIN Hai gen,YANG Yi ming,WANG Wu qi,et al.Deep deterministic policy gradient based cooperativeplatoon longitudinal control strategy[J].Journal of Chang’an University (Natural Science Edition),2021,41(4):90-100.
点击复制

基于深度确定性策略梯度的队列纵向协同控制策略()
分享到:

长安大学学报(自然科学版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
第41卷
期数:
2021年4期
页码:
90-100
栏目:
交通工程
出版日期:
2021-07-15

文章信息/Info

Title:
Deep deterministic policy gradient based cooperativeplatoon longitudinal control strategy
作者:
闵海根12杨一鸣1王武祺1方煜坤1宋晓鹏3
(1. 长安大学 信息工程学院,陕西 西安 710064; 2. 长安大学 “车联网”教育部中国移动联合实验室,陕西 西安 710064; 3. 浙江省交通规划设计研究院有限公司,浙江 杭州 310017)
Author(s):
MIN Haigen12 YANG Yiming1 WANG Wuqi1 FANG Yukun1 SONG Xiaopeng3
(1. School of Information & Engineering, Changan University, Xian 710064, Shaanxi, China;2. Joint Laboratory for Internet of Vehicles, Ministry of EducationChina MobileCommunications Corporation, Changan University, Xian 710064, Shaanxi, China;3. Zhejiang Transportation Planning and Design Institute Co., Ltd, Hangzhou 310017, Zhejiang, China)
关键词:
交通工程深度强化学习队列纵向控制深度确定性策略梯度队列稳定性
Keywords:
traffic engineering deep reinforcement learning platoon longitudinal control deep deterministic policy gradient platoon string stability
文献标志码:
A
摘要:
为了解决车辆队列控制中的车辆连续精确控制问题和行驶过程中车辆队列纵向稳定性问题,提出了一种在中等速度环境下基于深度强化学习(deep reinforcement learning,DRL)的车辆队列纵向控制策略。该策略充分考虑了影响队列安全的车辆距离、车辆速度和车辆加速度3个关键影响因素,并将车辆动力学和舒适性作为策略学习过程中的约束条件。首先,建立基于强化学习的车辆队列纵向控制模型。其次,提出一个深度强化学习过程来进行队列纵向控制策略的迭代,最终目标为获得车辆的最优控制策略;并且设计了一个多目标的奖励函数,该函数综合了距离误差、速度误差和加速度约束对应的奖励。最后,采用深度确定性策略梯度(deep deterministic policy gradient,DDPG)求解队列纵向控制问题,该算法将动作评价(actorcritic,AC)网络的优点与深度Q网络(deep Qnetwork,DQN)的优点相结合,有效解决连续状态空间和连续动作空间上的车辆队列控制问题;并设计和训练了基于DDPG的队列控制模型用于队列纵向控制,验证该控制策略的有效性。结果表明:提出的基于强化学习的队列控制方法具有和分布式模型预测控制算法相当的控制精度,并能在“前车领航车跟随”通信拓扑下实现队列的串稳定性。
Abstract:
To solve the problem of continuous and accurate platoon control and string stability during platoon traveling, a deep reinforcement learning (DRL)based platoon longitudinal control strategy at moderate speed was proposed. Three key factors including spacing, vehicle speed and acceleration, were fully considered and satisfied by the proposed strategy, which considers vehicle dynamics and comfort in the learning process. First, the platoon control process was modeled and the algorithm of the reinforcement learning was illustrated. Second, a DRLbased method that determines the optimal strategy for platoon longitudinal control was proposed. Particularly, a multiobjective reward function was designed, which can integrate the rewards corresponding to the distance error, speed error, and acceleration constraints. Third, the deep deterministic policy gradient (DDPG) was adopted to solve the platoon longitudinal control problem. The algorithm combined actorcritic (AC) and deep Qnetwork (DQN) to effectively solve the problem of platoon control in continuous state space and continuous action space. The results show that the proposed platoon control method based on reinforcement learning has the same control accuracy as the distributed model predictive control algorithm, and can achieve the string stability of a platoon under the leaderfollower communication topology. 3 tabs, 11 figs, 19 refs.

相似文献/References:

[1]王建伟,李娉,高洁,等.中国交通运输碳减排区域划分[J].长安大学学报(自然科学版),2012,32(01):0.
[2]李曙光,周庆华.具有破坏排队的离散时间动态网络装载算法[J].长安大学学报(自然科学版),2012,32(01):0.
[3]凌海兰,郗恩崇.基于随机波动条件的公交客运量预测模型[J].长安大学学报(自然科学版),2012,32(01):0.
[4]田娥,肖庆,陆小佳,等.安全驾驶的横向安全预警报警阈值的确定[J].长安大学学报(自然科学版),2012,32(01):0.
[5]侯贻栋,赵炜华,魏 朗,等.驾驶人空间距离判识规律心理学分析[J].长安大学学报(自然科学版),2012,32(03):86.
 HOU Yi-dong,ZHAO Wei-hua,WEI Lang,et al.Analysis on psychology in cognitive distance about drivers[J].Journal of Chang’an University (Natural Science Edition),2012,32(4):86.
[6]赵跃峰,张生瑞,魏 华.隧道群路段运行速度特性分析[J].长安大学学报(自然科学版),2012,32(06):67.
 ZHAO Yue-feng,ZHANG Sheng-rui,WEI hua.Operating speed characteristics of tunnel group section[J].Journal of Chang’an University (Natural Science Edition),2012,32(4):67.
[7]林 杉,许宏科,刘占文.一种高速公路隧道交通流元胞自动机模型[J].长安大学学报(自然科学版),2012,32(06):73.
 LIN Shan,XU Hong-ke,LIU Zhan-wen.One cellular automaton traffic flow model for expressway tunnel[J].Journal of Chang’an University (Natural Science Edition),2012,32(4):73.
[8]刘俊德,徐 兵,梁永东,等.交通事故下高速公路行车安全评估[J].长安大学学报(自然科学版),2012,32(06):78.
 LIU Jun-de,XU bing,LIANG Yong-dong,et al.Traffic safety assessment of expressway in the accident[J].Journal of Chang’an University (Natural Science Edition),2012,32(4):78.
[9]芮海田,吴群琪,赵跃峰,等.公路建设对区域经济发展的影响分析——以陕西省为例[J].长安大学学报(自然科学版),2012,32(06):83.
 RUI Hai-tian,WU Qun-qi,ZHAO Yue-feng,et al.Influence of highway construction on regional economy development——taking Shaanxi as an example[J].Journal of Chang’an University (Natural Science Edition),2012,32(4):83.
[10]彭 辉,续宗芳,韩永启,等.城市群城际运输结构配置客流分担率模型[J].长安大学学报(自然科学版),2012,32(02):91.
 PENG Hui,XU Zong-fang,HAN Yong-qi,et al.Sharing ratios model of passenger flows in intercity transportation structure configuration among urban agglomeration[J].Journal of Chang’an University (Natural Science Edition),2012,32(4):91.

更新日期/Last Update: 2021-08-12