AI ‘neuron freezing’ offers safety breakthrough - UK Times

Artificial intelligence researchers have developed a novel technique to make ChatGPT and other popular chatbots safer.

The method, referred to as “neuron freezing”, prevents users from bypassing the built-in safety filters of the large language models (LLMs) underpinning these AI tools.

Currently, these LLMs treat safety as a binary checkpoint at the start of generating an answer; If a query appears safe, the AI will proceed, but if it seems dangerous then it will refuse.

Users have been able to find ways of getting round these checks by framing harmful prompts in different context. One study last year, for example, found that AI safety measures could be bypassed by rephrasing a nefarious prompt as a poem.

These workarounds require retraining or individual patches in order to fix them, but the new research offers a way to hard code ethical boundaries into LLMs to prevent misuse.

The breakthrough, made by a team at North Carolina State University, involves identifying specific safety-critical “neurons” within the neural network and freezing them in order to retain the safety characteristics – no matter how the task is defined by a user.

“Our goal with this work was to provide a better understanding of existing safety alignment issues and outline a new direction for how to implement a non-superficial safety alignment for LLMs,” said Jianwei Li, a PhD student at NC State University who led the research.

“We found that ‘freezing’ these specific neurons during the fine-tuning process allows the model to retain the safety characteristics of the original model while adapting to new tasks in a specific domain.”

Jung-Eun Kim, an assistant professor of computer science at North Carolina State University, added: “The big picture here is that we have developed a hypothesis that serves as a conceptual framework for understanding the challenges associated with safety alignment in LLMs, used that framework to identify a technique that helps us address one of those challenges, and then demonstrated that the technique works.”

The researchers hope their work will help serve as a foundation to develop new techniques that allow AI models to continuously reevaluate whether their reasoning is safe or unsafe while generating responses.

The breakthrough was detailed in a paper, titled ‘Superficial safety alignment hypothesis’, which is due to be presented next month at the Fourteenth International Conference on Learning Representations (ICLR2026) in Brazil.

What's Hot

A46 southbound between A428 and A45 near Coventry (east) | Southbound | Road Works

French arm of Swiss bank Edmond de Rothschild searched by authorities in Epstein-related probe – UK Times

M5 southbound between J8 and J9 | Southbound | Congestion

AI ‘neuron freezing’ offers safety breakthrough – UK Times

A46 southbound between A428 and A45 near Coventry (east) | Southbound | Road Works

French arm of Swiss bank Edmond de Rothschild searched by authorities in Epstein-related probe – UK Times

M5 southbound between J8 and J9 | Southbound | Congestion

California governor debate canceled after discrimination accusations and calls to boycott – UK Times

M1 southbound between J18 and J17 | Southbound | Road Works

A14 westbound between J47 and J45 | Westbound | Road Works

A46 southbound between A428 and A45 near Coventry (east) | Southbound | Road Works

French arm of Swiss bank Edmond de Rothschild searched by authorities in Epstein-related probe – UK Times

M5 southbound between J8 and J9 | Southbound | Congestion

What's Hot

AI ‘neuron freezing’ offers safety breakthrough – UK Times

Related News