Monte Carlo Gridworld Tables

This view exposes the learned action-values, epsilon-soft policy probabilities, visit counts, and the tabular transition model used by the Monte Carlo case study. The data comes from monte_carlo_policies.json.

Learned Q Table

Each row is a state, each column is an action, and max Q is the learned state score used by the game view.

Policy and Visit Counts

Policy probabilities come from the final epsilon-soft policy. Visit counts show how many first-visit returns contributed to each state-action estimate.

Transition Model

This is the environment dynamics P(s', r, done | s, a) used by the stochastic gridworld.