Difference between DQN and Policy Gradient

May 31, 2021

DQN: we feed the state as an input to the network, and it returns the Q values of all possible actions in that state, then we select an action that has a maximum Q value.

Policy gradient: we feed the state as input to the network, and it
returns the probability distribution over an action space, and our stochastic policy uses the probability distribution returned by the neural network to select an action.

Difference between DQN and Policy Gradient

Written by Yanwei Liu

No responses yet