GPT-5.5 Saturates Our Offensive Cybersecurity Time Horizons
A short follow-up to our March 2026 offensive cyber time-horizon study. Run under the same methodology, GPT-5.5 solves almost the entire dataset; pushed to a 50M-token budget the dataset is saturated and its time horizon is no longer measurable.
Offensive Cybersecurity Time Horizons
Measuring the rate at which AI offensive cybersecurity capability is increasing, using IRT methodology across seven benchmarks with professional human baselines. Full experimental design, results, and sensitivity analysis.
Measuring Attacker and Monitor Capability, Task Bias, and Attack Selection
Tournament-style ratings fit to Control Arena evaluation logs to estimate attacker capability, monitor effectiveness, task bias, and attack selection across model generations.