DQN, as is, is not the state of the art technology anymore. But I would not comment which one is because the field changes very fast and the performance depends on the context also.

But for the discount, it is for training stability and other reasons. So if your problem cannot converge, this is the better alternative. Also, we are interested in finding the optimal actions, not the accurate value of the rewards. In fact, information is noisy as times go by. So non-discounted reward is not necessarily good. Will a discounted reward lead to the same optimal solution but with greater stability? That is the question that may be more interesting.

Written by

Deep Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store