Monte Carlo Gridworld Tables
This view exposes the learned action-values, epsilon-soft policy probabilities, visit counts, and the tabular transition model used by the Monte Carlo case study. The data comes from monte_carlo_policies.json.
Learned Q Table
Each row is a state, each column is an action, and max Q is the learned state score used by the game view.
Policy and Visit Counts
Policy probabilities come from the final epsilon-soft policy. Visit counts show how many first-visit returns contributed to each state-action estimate.
Transition Model
This is the environment dynamics P(s', r, done | s, a) used by the stochastic gridworld.