Abstract:Due to the high flexibility of the robot arm,it can simulate human arm to complete packing work of fragile bottled food,and correct errors existing in trajectory planning in real time to improve the stability and accuracy.Based on the basic architecture of the boxed manipulator,a non-collision trajectory planning model for the manipulator was proposed based on deep Q-RBF reinforcement learning network.The hidden layer units of the RBF network were adjusted according to the samples to be modeled through the resource allocation adaptive method,so as to improve the network learning rate and online learning ability.The optimal operation set was obtained by combining the adaptive Q-reinforcement learning algorithm.The learning rate adjustment method was used to complete the parameter learning of the network.The simulation and experimental results show that,compared with the other two methods,the method has strong obstacle avoidance ability,and the manipulator can move according to the predetermined trajectory better; the change of collision avoidance process is moderate,and can converge as soon as possible and gradually tend to be stable.