基于梯度奖励的深度强化学习移动机器人路径规划

doi:10.3969/j.issn.1001-3881.2023.17.006

首页 > 过刊浏览>2023年第51卷第17期 >32-38. DOI:10.3969/j.issn.1001-3881.2023.17.006

基于梯度奖励的深度强化学习移动机器人路径规划
DOI:
                        10.3969/j.issn.1001-3881.2023.17.006
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:

Path Planning of Mobile Robot with Deep Reinforcement Learning Based on Gradient Reward

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

针对目前深度强化学习移动机器人路径规划中稀疏奖励导致的效率低、收敛慢等问题，提出一种梯度奖励政策。使用区域分割将环境分割为缓冲区、探索区、临近区以及目标区，奖励的动态变化可以逐步缩小机器人的探索范围，同时在安全区域内也能获得正向奖励。首先输入机器人当前的位置坐标，经过神经网络后估计4个动作的 Q 值，随后通过去首动态贪婪策略达到最大化探索，最后采用基于均方误差的优先经验回放抽取样本进行梯度下降更新网络。实验结果表明：在小范围环境内探索效率可提升近40%，在大范围环境下成功率高于80%，而且在提高探索效率的同时增强了鲁棒性。

Abstract:

Aiming at the problems of low efficiency and slow convergence caused by sparse reward in path planning of mobile robots with deep reinforcement learning, a gradient reward policy was proposed. Region segmentation was used to divide the environment into buffer zone, exploration zone, adjacent zone and target zone, the dynamic change of reward could gradually reduce the exploration scope of the robot, and at the same time, it could also obtain positive rewards in safe area. The robot current position coordinates were input, the Q values of the four actions were estimated after passing through the neural network, then the exploration was maximized through the decapitation dynamic greedy strategy, finally the priority experience playback based on the mean square error was used to extract samples to update the network with gradient descent. The experimental results show that the exploration efficiency can be improved by nearly 40% in a small-scale environment, and the success rate is higher than 80% in a large-scale environment, the robustness is enhanced while improving the exploration efficiency.

参考文献

相似文献

引证文献

引用本文

喻凯旋.基于梯度奖励的深度强化学习移动机器人路径规划[J].机床与液压,2023,51(17):32-38.
YU Kaixuan. Path Planning of Mobile Robot with Deep Reinforcement Learning Based on Gradient Reward[J]. Machine Tool & Hydraulics,2023,51(17):32-38

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:
最后修改日期:
录用日期:
在线发布日期: 2023-09-27
出版日期: 2023-09-15

欢迎访问机床与液压官方网站!

网站首页

杂志简介

编委会

投稿须知

广告合作

联系我们

ENGLISH

引用本文

分享

文章指标

历史