Solving comma.ai's Camera Calibration Challenge
08 Jan 2025Achieving 7.77% error using optical flow and median aggregation
I worked on comma.ai's camera calibration challenge, which asks you to predict camera pitch and yaw angles from dashcam video. The goal is to determine how the camera is misaligned relative to the vehicle's direction of travel.
The Key Insight
The critical realization was that camera calibration is constant per video. Unlike per-frame predictions that need smoothing, I could aggregate optical flow data across all frames and use the median as a robust calibration estimate.
The Algorithm
The solution uses the Focus of Expansion (FOE) theory: when a camera moves forward, the optical flow radiates outward from a single point. If the camera is misaligned, this point shifts from the image center.
Per-frame pipeline:
- Shi-Tomasi corner detection (up to 3,000 features per frame)
- Lucas-Kanade pyramidal optical flow tracking
- Forward-backward validation to filter bad tracks
- RANSAC-based FOE estimation (1,000 iterations)
Video-level aggregation:
- Collect FOE estimates from all frames
- Apply median pooling to handle outliers from turns, stops, and tracking failures
- Convert pixel offset to pitch/yaw angles
Implementation Details
- ROI masking: exclude sky (top 40%) and hood (bottom 10%)
- Flow magnitude filtering: only use flows \(\geq 7.0\) pixels for reliability
- RANSAC with 1,000 iterations for robust FOE estimation
Results
- Overall MSE: 0.000118
- Error score: 7.77% (target was <25%)
- Improvement over baseline: 92.23% error reduction