Offensive Cybersecurity Time Horizons
Measuring the rate at which AI offensive cybersecurity capability is increasing, using IRT methodology across seven benchmarks with professional human baselines. Full experimental design, results, and sensitivity analysis.
Measuring Attacker and Monitor Capability, Task Bias, and Attack Selection
Tournament-style ratings fit to Control Arena evaluation logs to estimate attacker capability, monitor effectiveness, task bias, and attack selection across model generations.