RL Lecture Notes: Sequential Decision Making

04 Apr 2019

Reinforcement Learning - Lecture 1 - Emma Brunskill

Key aspects of reinforcement learning:

When planning: Decisions involve reasoning about not just immediate benefit but also longer-term ramifications.

When learning: Temporal credit assignment is hard - what caused later high or low rewards?

A policy is a mapping from past experience to action.

Supervised Learning: Typically making one decision instead of a sequence of decisions.

Imitation Learning: Learns from experience of others, assumes input demos of good policies. Imitation + RL seems promising.

Goal: Maximize the total expected future reward (the world is stochastic so the agent maximizes rewards in expectation).

State $s_t$ is Markov if and only if:

$$p(s_{t+1} | s_t, a_t) = p(s_{t+1} | h_{t}, a_{t})$$

The current state is a sufficient statistic of history.

Rishabh Ranawat