Abstract:Aiming at the problems of low efficiency and slow convergence caused by sparse reward in path planning of mobile robots with deep reinforcement learning, a gradient reward policy was proposed. Region segmentation was used to divide the environment into buffer zone, exploration zone, adjacent zone and target zone, the dynamic change of reward could gradually reduce the exploration scope of the robot, and at the same time, it could also obtain positive rewards in safe area. The robot current position coordinates were input, the Q values of the four actions were estimated after passing through the neural network, then the exploration was maximized through the decapitation dynamic greedy strategy, finally the priority experience playback based on the mean square error was used to extract samples to update the network with gradient descent. The experimental results show that the exploration efficiency can be improved by nearly 40% in a small-scale environment, and the success rate is higher than 80% in a large-scale environment, the robustness is enhanced while improving the exploration efficiency.