Actor-Critic Algorithms: Implementation Notes

01 Jan 2019

Implementing and understanding actor-critic algorithms

Overview

Actor-critic methods combine the benefits of policy gradient methods (the actor) with value function approximation (the critic). The actor learns a policy, while the critic evaluates states to reduce variance in policy gradient estimates.

The Algorithm

Following Sergey Levine's lecture, the batch actor-critic algorithm works as follows:

Key Components

Actor: The policy network \(\pi_\theta(a|s)\) that maps states to action distributions.

Critic: The value network \(V_\phi(s)\) that estimates expected returns from a state.

Advantage: The advantage function \(A(s,a) = Q(s,a) - V(s)\) tells us how much better an action is compared to the average.

Why Actor-Critic?

Lower variance: Using a learned baseline (critic) reduces variance compared to REINFORCE
Online learning: Can update after each step, not just at episode end
Continuous actions: Works well with continuous action spaces

Implementation: actorCritic.py

Rishabh Ranawat

Actor-Critic Algorithms: Implementation Notes

Overview

The Algorithm

Key Components

Why Actor-Critic?

Related Posts

Fairchild: An Evaluation-Native IDE for Chip Design 05 Jul 2026

XQuiz: AI-Powered Learning from Your Twitter Feed 12 Jan 2026

DataRater: Meta-Learned Dataset Curation 15 Jan 2025