National Key Research and Development Program of China (2016YFD0700602); National Natural Science Foundation of China (61603377)
现有无人车辆的驾驶策略过于依赖感知-控制映射过程的“正确性”，而忽视了人类驾驶汽车 时所遵循的驾驶逻辑。该研究基于深度确定性策略梯度算法，提出了一种具备类人驾驶行为的端到端 无人驾驶控制策略。通过施加规则约束对智能体连续行为的影响，建立了能够输出符合类人驾驶连续 有序行为的类人驾驶端到端控制网络，对策略输出采用了后验反馈方式，降低了控制策略的危险行为 输出率。针对训练过程中出现的稀疏灾难性事件，提出了一种更符合控制策略优化期望的连续奖励函 数，提高了算法训练的稳定性。不同仿真环境下的实验结果表明，改进后的奖励塑造方式在评价稀疏 灾难性事件时，对目标函数优化期望的近似程度提高了 85.57%，训练效率比传统深度确定性策略梯度 算法提高了 21%，任务成功率提高了 19%，任务执行效率提高了 15.45%，验证了该方法在控制效率和 平顺性方面具备明显优势，显著减少了碰撞事故。
The driving decisions of human drivers have the social intelligence to handle complex conditions in addition to the driving correctness. However, the existing autonomous driving strategies mainly focus on the correctness of the perception-control mapping, which deviates from the driving logic that human drivers follow. To solve this problem, this paper proposes a human-like autonomous driving strategy in an end-toend control framework based on deep deterministic policy gradient (DDPG). By applying rule constraints to the continuous behavior of the agents, an unmanned end-to-end control strategy was established. This strategy can output continuous and reasonable driving behavior that is consistent with the human driving logic. To enhance the driving safety of the end-to-end decision-making scheme, it utilizes the posterior feedback of the policy output to reduce the output rate of dangerous behaviors. To deal with the catastrophic events in the training process, a continuous reward function is proposed to improve the stability of the training algorithm. The results validated in different simulation environments showed that, the proposed human-like autonomous driving strategy has better control performance than the traditional DDPG algorithm. And the improved reward shaping method is more in line with the control strategy to model the catastrophic events of sparse rewards. The optimization expectation of the objective function can be increased by 85.57%. The human-like DDPG autonomous driving strategy proposed in this paper improves the training efficiency of the traditional DDPG algorithm by 21%, the task success rate by 19%, and the task execution efficiency by 15.45%, which significantly reduces collision accidents.
吕 迪,徐 坤,李慧云,潘仲鸣.融合类人驾驶行为的无人驾驶深度强化学习方法 [J].集成技术,2020,9(5):34-47
LV Di, XU Kun, LI Huiyun, PAN Zhongming. Human-Like Driving Strategy Based on Deep Reinforcement Learning for Autonomous Vehicles[J]. Journal of Integration Technology,2020,9(5):34-47