Solving comma.ai's Camera Calibration Challenge

08 Jan 2025

Achieving 7.77% error using optical flow and median aggregation

I worked on comma.ai's camera calibration challenge, which asks you to predict camera pitch and yaw angles from dashcam video. The goal is to determine how the camera is misaligned relative to the vehicle's direction of travel.

The Key Insight

The critical realization was that camera calibration is constant per video. Unlike per-frame predictions that need smoothing, I could aggregate optical flow data across all frames and use the median as a robust calibration estimate.

The Algorithm

The solution uses the Focus of Expansion (FOE) theory: when a camera moves forward, the optical flow radiates outward from a single point. If the camera is misaligned, this point shifts from the image center.

Per-frame pipeline:

Shi-Tomasi corner detection (up to 3,000 features per frame)
Lucas-Kanade pyramidal optical flow tracking
Forward-backward validation to filter bad tracks
RANSAC-based FOE estimation (1,000 iterations)

Video-level aggregation:

Collect FOE estimates from all frames
Apply median pooling to handle outliers from turns, stops, and tracking failures
Convert pixel offset to pitch/yaw angles

Implementation Details

ROI masking: exclude sky (top 40%) and hood (bottom 10%)
Flow magnitude filtering: only use flows \(\geq 7.0\) pixels for reliability
RANSAC with 1,000 iterations for robust FOE estimation

Results

Overall MSE: 0.000118
Error score: 7.77% (target was <25%)
Improvement over baseline: 92.23% error reduction

Code: github.com/rishabhranawat/calib_challenge

Rishabh Ranawat

Solving comma.ai's Camera Calibration Challenge

The Key Insight

The Algorithm

Implementation Details

Results

Related Posts

DataRater: Meta-Learned Dataset Curation 15 Jan 2025

XQuiz: AI-Powered Learning from Your Twitter Feed 12 Jan 2025

DataMixer: A Library for Combining Datasets 10 Jan 2025