Runway Zero

RL-trained LLMs recover airport crises that rule-based systems cannot solve end to end.

Airlines still depend on highly trained operations-control teams when disruption cascades across aircraft, crews, passengers, runways, airline economics, and fairness. Runway Zero turns that unsolved end-to-end recovery problem into an OpenEnv environment where LLM agents learn from verifiable rewards.

Open crisis replay Training evidence HF Space

393 RL / 1,827 basesimulated cancellations across all models/levels

90%delay reduction after RL

4 × 4models by crisis levels

7B RL > 120B w/o RLtrained Qwen2.5 beats every base model

Hackathon Story

Most benchmarks reward planning. Runway Zero rewards recovery.

Static schedules are easy. The real skill is what happens when fog hits Delhi, Mumbai loses a runway, Bengaluru slots become political, crew legality collapses, passengers are stranded, and every airline asks Tower Central to favor them. Existing tools support pieces of this work; the complete end-to-end recovery problem remains human-led.

Level 1

Operations Recovery

Fog, runway debris, and aircraft faults test whether the model can make safe dispatch decisions.

84 RL / 38 baseRecovery score comparison

Open replay Level 2

Passenger-Aware Recovery

Connections, stranded passengers, emergency arrivals, and gate failures turn delay into human cost.

88 RL / 29 baseRecovery score comparison

Open replay Level 3

Economic Multi-Agent Control

IndiGo, Air India, Akasa Air, and SpiceJet negotiate slots while Tower Central preserves fairness.

90 RL / 21 baseRecovery score comparison

Open replay Level 4

IndiGo Crisis Replay

A December 2025-style crew availability crisis shows how RL recovery could reduce mass cancellations.

82 RL / 12 baseRecovery score comparison

Open replay