[CS.AI] Can Editing a Single Neuron Fix Repetition Loops ...

This paper investigates the repetition loop issue in Gemma 4 instruction-tuned models when handling long factual enumeration prompts, such as listing every episode of a TV series, the 88 IAU constellations, or the 151 original Pokémon. These models can fall into repetition at rates as high as 95%, manifesting as tight verbatim loops or lists that decay into a single answer. To localize the cause, we employed per-layer ablation and per-neuron attribution, confirming the strongest candidates through full-generation sweeps. The loops trace back to a small set of MLP neurons or, in the 26B-A4B Mixture-of-Experts model, a few routed experts. We suppress these neurons with static weight edits, which can be as minimal as a sign-inverted neuron. The size of effective edits increases with model scale, yet in all cases, loop patterns can be addressed within normal generation budgets while maintaining general-purpose benchmark scores. However, these edits do not resolve all issues: we also examine longer thinking budgets, where the two larger models noticeably enter a 'doom looping' phase, self-correcting over a fact they cannot recall and exhausting the budget without a final answer. We demonstrate that this residual failure is reduced but not eliminated by the same edits, arguing that it is fundamentally a knowledge-precision problem rather than a removable circuit; weight surgery can delete a loop but cannot supply a missing fact. Our results provide a feasibility demonstration that a concrete generation pathology can be localized to a few parameters and edited out, while delineating where this approach stops.

Blogger's Review: This paper showcases the potential of fine-tuning neuron edits to address LLM generation pathologies, while emphasizing the critical role of knowledge precision, indicating that simple weight adjustments cannot fully rectify memory deficiencies in models. Future research should further explore ways to enhance the knowledge base of these models.

[CS.AI] Can Editing a Single Neuron Fix Repetition Loops in LLMs?