RL Lecture Notes: Sequential Decision Making
04 Apr 2019Reinforcement Learning - Lecture 1 - Emma Brunskill
Learning to Make Good Sequential Decisions
Key aspects of reinforcement learning:
- Optimization: Finding policies that maximize expected reward
- Exploration: Balancing exploration vs exploitation
- Generalization: Transferring knowledge to new situations
- Delayed Consequences: Decisions have long-term ramifications
Challenges
When planning: Decisions involve reasoning about not just immediate benefit but also longer-term ramifications.
When learning: Temporal credit assignment is hard - what caused later high or low rewards?
Policy
A policy is a mapping from past experience to action.
Comparison with Other Paradigms
Supervised Learning: Typically making one decision instead of a sequence of decisions.
Imitation Learning: Learns from experience of others, assumes input demos of good policies. Imitation + RL seems promising.
Sequential Decision Making Under Uncertainty
Goal: Maximize the total expected future reward (the world is stochastic so the agent maximizes rewards in expectation).
The Markov Assumption
State \(s_t\) is Markov if and only if:
$$p(s_{t+1} | s_t, a_t) = p(s_{t+1} | h_{t}, a_{t})$$The current state is a sufficient statistic of history.
Problem Variants
- Finite horizon vs. infinite horizon
- Stationary vs. non-stationary