Abstract
The abundance of pre-trained diffusion models provides an opportunity for composition. However, combining several models runs the risk of one model dominating or models disagreeing with each other. Here, we propose Divide-and-Denoise, a method for coordinating multiple pre-trained diffusion models during sampling.
Method
Much like managing a specialized workforce, our method creates a fair but efficient division of labor across models. Central to our method is the allocation, which defines the responsibility of each model to every region of the noisy sample.
At every timestep, we denoise by:
- Updating the allocation by solving a fair division game, where we divide the sample into regions that maximize total utility under fairness constraints;
- Aligning the models with this allocation, guiding each model to denoise within its assigned region.
This leads to a new composite denoising process that evolves in tandem with the division process. We evaluate Divide-and-Denoise on conditional image generation. Across several quality metrics, including the GenEval benchmark, our method outperforms baselines and resolves common failures such as missing objects and mismatched attributes. Experiments show that Divide-and-Denoise utilizes each model's expertise without neglecting any other model.
Blogger's Review: The Divide-and-Denoise method effectively coordinates multiple diffusion models through a game-theoretic approach, showcasing fairness and efficiency in handling complex generative tasks. This approach not only provides a new theoretical framework but also offers strong support for practical applications, making it worthwhile to explore its potential in other fields.