Model merging has become a practical post-training strategy for constructing a single multi-task large language model (LLM) by integrating multiple task-specific models. However, most existing approaches rely on post-hoc merging, where task-specific models are aggregated only once after training. This one-shot aggregation often suffers from task interference, leading to information erasure between individual tasks.
In this work, we demonstrate that replacing post-hoc merging with an iterative many-shot merging protocol effectively enhances multi-task performance. Building on this insight, we propose METIS, Mitigating Erasure from Task Interference for Stable many-shot merging. METIS is a loss-aware many-shot merging method that addresses information erasure in post-hoc merging through task-wise loss-gap weighting and consensus-based masking. Notably, METIS exhibits significant performance improvement on the worst-performing task, effectively mitigating information erasure.
(Project page: METIS)
Blogger's Review: This paper introduces a novel many-shot merging strategy that effectively addresses the information loss issues inherent in traditional post-hoc merging. The innovation and practicality of the METIS method present significant potential for future multi-task model development, making it a worthy subject for further exploration and application.