Deep Reinforcement Learning Notes
[2026-02-08 08:25:38 AM] RL is a separate branch of machine learning, and is distinctive from supervised or non supervised.
In supervised learning it is f(y|x), in unsupervised it is f(x), and in RL it is f(a|s), where the value of a given state is dependent on the value of the next state it will take.
- goal is clear
- Policy is present that makes move that maximizes chances of reaching the goal.
- learning while interacting with the environment or the opponent.
V(St) <- V(St) + alpha(V(St+1) - V(St))
This is typically a greedy move, as th focus is to maximum the value