Whilst safeguards applied by responsible model developers can limit misuse of AI, these protections can often be bypassed (and in open‑weight models can be removed or are absent from the start). Publicly-available AI model weights can be modified, and safeguards removed entirely. Reporting from multiple frontier labs over the last few years has shown attackers are using frontier AI models to aid their operations. For example:
In its recent research into measuring AI agents’ progress on multi-step cyber attack scenarios, AISI evaluated the cyber capabilities of 7 frontier AI models, released before March 2026. Importantly, the capabilities are inherently dual-use, meaning the skills that could be used by attackers – such as identifying vulnerabilities and developing exploits – can also be used by defenders for security testing and hardening.
The models were given specific tasks in 2 simulated environments (an enterprise network and an industrial control system) and left to operate autonomously.
On the 32-step enterprise network attack, estimated to take a human cyber security expert approximately 14 hours to complete end-to-end, the best-performing model (Claude Opus 4.6, released February 2026):
- Averaged 15.6 steps, with extended processing time – which corresponds to roughly 6 of the 14 hours a human expert would need.
- Averaged 9.8 steps without extended processing time – up from fewer than 2 steps 18 months earlier.
- Completed its single best run in 22 of 32 steps.
As of yet, no AI system has completed the full scenario end-to-end.
On the more complex industrial control system attack scenario, AI performance was significantly more limited. But even here there were early signs of progress: the most recent models were the first to make any consistent headway, and in some cases found attack approaches the scenario designers hadn’t anticipated.



