[CS.AI] Squeeze-Release: Exact Structural Minimization fo...

Unstructured pruning produces sparse weight tensors, but the standard implementation keeps tensor shapes unchanged, resulting in no size reduction for the deployed model. We present an exact structural rewrite called minimization that converts a masked network into a smaller dense network while maintaining the same forward function up to floating-point rounding. The Squeeze-Release cycle iterates pruning and minimization with an intermediate release step that re-enables exact-zero positions inside the compacted tensors as small calibrated noise, turning otherwise wasted capacity back into trainable parameters. Successive cycles exploit this capacity to uncover structural redundancy that a single pass cannot reach.

Additionally, we introduce CompensatedLayerNorm, a function-preserving replacement for LayerNorm that extends minimization to channel reduction across LayerNorm-equipped residual streams. Squeeze-Release compresses the deployable network to 39x smaller than the unpruned model on a fully-connected model network and 14.8x smaller on modern CNN (ConvNeXt-Tiny), while maintaining comparable accuracy. Moreover, we prove that the rewrite can be extended to transformer architectures.

Blogger's Review: The Squeeze-Release approach effectively compresses neural networks through precise rewriting and iterative pruning, demonstrating significant size reduction while preserving model performance. This has crucial implications for deploying large-scale models, particularly in resource-constrained environments. Its extensibility to transformer architectures also opens new avenues for future research.

[CS.AI] Squeeze-Release: Exact Structural Minimization for Pruning