Difference between DQN and Policy Gradient

Yanwei Liu
May 31, 2021

--

DQN: we feed the state as an input to the network, and it returns the Q values of all possible actions in that state, then we select an action that has a maximum Q value.

Policy gradient: we feed the state as input to the network, and it
returns the probability distribution over an action space, and our stochastic policy uses the probability distribution returned by the neural network to select an action.

--

--

No responses yet