On Reversibility as Language Model Behavioral Property in Parametric Knowledge Editing
Emanuele Caddeo, Manuela Sanguinetti, Maurizio AtzoriLarge Language Models (LLMs) store substantial factual knowledge within their parameters, motivating growing interest in parametric knowledge editing as an alternative or complement to external knowledge bases. While prior work has extensively evaluated the effectiveness of editing methods, the stability and reversibility of these modifications remain underexplored. In this work, we introduce an operational definition of reversibility as model’s property, and empirically investigate it by comparing two prominent editing methods, ROME and MEMIT, across models of different scales commonly used in knowledge editing tests (GPT-2 XL and GPT-J 6B) and using a benchmark derived from CounterFact. We analyze models in three stages: the original state, the edited state, and the reverted state. Our results show that both editing and revert operations preserve overall model stability in most cases, as indicated by minimal perplexity variation. Editing is consistently successful and frequently improves performance relative to the original model, suggesting that knowledge edits can reinforce the internal representation of specific facts. Although rare collapse events are observed with ROME, indicating occasional global model perturbations, no such behavior emerges in experiments with MEMIT. Overall, our findings suggest that parametric knowledge editing can be both stable and reversible, while also revealing important differences in robustness across editing methods.