[DeepMind] T5Gemma: Redefining the Encoder-Decoder Model Era

In the rapidly evolving landscape of large language models (LLMs), the spotlight has largely focused on the decoder-only architecture. However, the classic encoder-decoder architecture, such as T5 (The Text-to-Text Transfer Transformer), remains a popular choice for many real-world applications. Encoder-decoder models excel at summarization, translation, QA, and more due to their high inference efficiency and richer encoder representation. Today, we introduce T5Gemma, a new collection of encoder-decoder LLMs developed by adapting pretrained decoder-only models into the encoder-decoder architecture.

In T5Gemma, we explore whether we can build top-tier encoder-decoder models based on pretrained decoder-only models using a technique called model adaptation. The core idea is to initialize the parameters of an encoder-decoder model using the weights of a pretrained decoder-only model, followed by further adaptation through UL2 or PrefixLM-based pre-training. This adaptation method is flexible, allowing creative combinations of model sizes.

Our experiments show that T5Gemma models achieve comparable or better performance than their decoder-only counterparts, dominating the quality-inference efficiency frontier across several benchmarks. For example, T5Gemma 9B-9B achieves higher accuracy than Gemma 2 9B with similar latency. Even more impressively, T5Gemma 9B-2B delivers a significant accuracy boost over the 2B-2B model while maintaining nearly identical latency.

After pre-training, T5Gemma exhibits promising capabilities, scoring significantly higher on complex tasks requiring reasoning. For instance, the T5Gemma 9B-9B scores over 9 points higher on GSM8K and 4 points higher on DROP compared to the original Gemma 2 9B model. This indicates that the adapted encoder-decoder architecture can create a more capable foundational model.

We are excited to release a suite of T5Gemma checkpoints, including various sizes and training objectives. We hope these checkpoints will provide valuable resources for research and development.

Blogger's Review: The launch of T5Gemma marks a significant breakthrough in the field of encoder-decoder models within large language models. By leveraging adaptation techniques, it combines the advantages of pretrained decoder-only models, showcasing enhanced performance and flexibility—definitely a development worth monitoring and exploring for both researchers and practical applications.