Sitemap
1 min readJun 15, 2020

--

Let's not focus on the terms since there are always many variants of how it may implement. So, the key question will be how reward is calculated. So fitting a function is one of them. Then there is Monte Carlo method that run the game until the end to find the total rewards. Again, there are methods between them. The Monte Carlo method usually have no bias but high variance. And other methods have other issues. At the end, we find one that have a reasonable balance for the problem domain.

--

--

No responses yet