MC Control with Epsilon-Greedy Policies ---Epsilon Value and Best Action prob error #252

hardik-kansal · 2023-12-23T14:57:11Z

Epsilon value is not decreased hyperbolically
At end of each episode ,there should be epsilion=epsilon/1.1

AbhinavSharma07 · 2024-04-14T10:53:35Z

Ensure proper epsilon decay by verifying correct division by 1.1, initialization, data types, and episode end triggers. Adjust decay rate if necessary .

lucasbasquerotto · 2024-07-30T16:11:52Z

If you are referring to the 2nd exercise of the Monte Carlo methods, https://github.com/dennybritz/reinforcement-learning/tree/master/MC (Implement the on-policy first-visit Monte Carlo Control algorithm, https://github.com/dennybritz/reinforcement-learning/blob/master/MC/MC%20Control%20with%20Epsilon-Greedy%20Policies%20Solution.ipynb), then there's no need to implement an epsilon decay.

The intention is to refine the state-action values of an epsilon-greedy policy toward the optimal policy (it won't become optimal because it's a soft policy). The requirement is to use a soft policy that approximates to the optimal greedy policy over its action-state values. The epsilon-greedy policy satisfies that requirement, even with a constant epsilon.

Although in a real world scenario an epsilon value with a decay would normally be better (especially in stationary environments, like the environment used in the exercise, blackjack), there's no need for use decay in this exercise. Actually, I think it's better to not include decay here, because in the book (Chapter 5) it specifies just an epsilon-greedy policy without decay, so it conforms more with the book, and focuses more on the control algorithm itself, instead of the possible policies that could be used (like Decay Schedules for 𝜖, Upper Confidence Bound (UCB), Boltzmann Exploration (Softmax), etc.), even if they would be a better fit and converge faster into the optimal policy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MC Control with Epsilon-Greedy Policies ---Epsilon Value and Best Action prob error #252

MC Control with Epsilon-Greedy Policies ---Epsilon Value and Best Action prob error #252

hardik-kansal commented Dec 23, 2023 •

edited

Loading

AbhinavSharma07 commented Apr 14, 2024

lucasbasquerotto commented Jul 30, 2024

MC Control with Epsilon-Greedy Policies ---Epsilon Value and Best Action prob error #252

MC Control with Epsilon-Greedy Policies ---Epsilon Value and Best Action prob error #252

Comments

hardik-kansal commented Dec 23, 2023 • edited Loading

AbhinavSharma07 commented Apr 14, 2024

lucasbasquerotto commented Jul 30, 2024

hardik-kansal commented Dec 23, 2023 •

edited

Loading