NeFut Logo NeFut
Admin Login

[CS.AI] Revolutionary Mixed-Precision Quantization Framework: MODE Enhances Performance of Multimodal Large Models

Published at: 2026-06-18 22:00 Last updated: 2026-06-20 13:47
#AI #Machine Learning #optimization

Abstract

Mixture-of-Experts Multimodal Large Language Models (MoE-MLLMs) offer remarkable performance but incur prohibitive GPU memory costs, making compression essential. Among post-training quantization (PTQ) methods, expert-level mixed-precision quantization has proven effective for MoE-LLMs, yet suffers notable degradation on MoE-MLLMs due to two overlooked biases in expert importance estimation:

  1. Cross-modal Bias: The numerical dominance of vision tokens causes expert selection frequency to be dominated by vision tokens, masking experts critical to the text modality;
  2. Intra-vision Bias: The large proportion of redundant vision tokens further skews frequency statistics, obscuring experts critical for informative visual content.

To bridge these gaps, we propose MODE, a modality-decomposed expert-level mixed-precision quantization framework for MoE-MLLMs that decomposes expert selection frequency by modality, filters redundant vision tokens to obtain denoised visual frequency, and further evaluates quantization sensitivity per modality as a complementary signal to frequency-based estimation. These signals are integrated into an Integer Linear Programming formulation to assign per-expert bit-widths under a given budget. Extensive experiments show that MODE is particularly well-suited for MoE-MLLMs, limiting average performance loss to within 2.9% at W3A16, with larger gains at the extreme 2-bit setting.

Blogger's Review: The introduction of the MODE framework effectively addresses performance degradation in MoE-MLLMs and enhances model performance in multimodal tasks through a modality decomposition approach. This innovation holds significant promise for compression and optimization in large-scale language models.

Original Source: https://arxiv.org/abs/2606.17118

[h] Back to Home