Our Mission
Grow Australia’s capacity for technical AI safety research. Provide talented researchers with the resources, direction, and institutional support to do work that matters.
What We're Working On
Offensive Cyber Time Horizons
Measuring the rate at which AI offensive cybersecurity capability is increasing, using IRT methodology across seven benchmarks with professional human baselines.
Covert Capability Evaluation
Developing structured methods to understand how effectively AI models can pursue covert objectives while evading detection. Drawing on AI control research and human-grounded performance comparison.
Interpretability via Activation Oracles
Exploring cycle consistency as an unsupervised training method to make the internal representations of language models legible, removing the need for labelled activation datasets.