We are excited to announce new multimodal models in the MedGemma collection, our most capable open models for health AI development. As healthcare increasingly adopts AI to enhance workflow management, patient communication, and diagnostic support, it is crucial that these AI systems are not only high-performing but also efficient and privacy-preserving. With this in mind, we built and released the Health AI Developer Foundations (HAI-DEF), a collection of lightweight open models designed to provide developers robust starting points for their health research and application development.
Because HAI-DEF models are open, developers retain full control over privacy, infrastructure, and modifications. We expanded the HAI-DEF collection in May with MedGemma, a set of generative models based on Gemma 3 aimed at accelerating healthcare and life sciences AI development. Today, we proudly announce two new models in this collection: MedGemma 27B Multimodal and MedSigLIP.
The MedGemma 27B Multimodal model complements the previously-released 4B Multimodal and 27B text-only models, adding support for complex multimodal and longitudinal electronic health record interpretation. MedSigLIP is a lightweight image and text encoder for classification, search, and related tasks, based on the same image encoder powering the 4B and 27B MedGemma models.
MedGemma and MedSigLIP serve as strong starting points for medical research and product development. MedGemma is useful for medical text or imaging tasks that require generating free text, such as report generation or visual question answering, while MedSigLIP is recommended for imaging tasks that involve structured outputs like classification or retrieval. All models can run on a single GPU, and MedGemma 4B and MedSigLIP can even be adapted for mobile hardware.
MedGemma 4B scores 64.4% on MedQA, ranking it among the best very small (<8B) open models. In an unblinded study, 81% of MedGemma 4B–generated chest X-ray reports were judged by a US board-certified radiologist to be accurate enough to result in similar patient management compared to original radiologist reports. The MedGemma 27B models perform exceptionally well on various benchmarks, including retrieval and interpretation of electronic health record data.
MedSigLIP, a lightweight image encoder with only 400M parameters, employs the Sigmoid loss for Language Image Pre-training (SigLIP) architecture. It has been adapted from SigLIP through tuning with diverse medical imaging data, allowing the model to learn nuanced features specific to these modalities. MedSigLIP bridges the gap between medical images and texts by encoding them into a common embedding space, achieving competitive performance across medical imaging domains.
The open nature of the MedGemma collection allows models to be downloaded, built upon, and fine-tuned to support developers' specific needs. This open approach offers distinct advantages in the medical field, such as flexibility, privacy, customization for optimal performance, and reproducibility. We look forward to learning how developers leverage MedGemma and MedSigLIP to create next-generation health AI tools.
To help developers get started, we’ve provided detailed notebooks on GitHub for MedGemma and MedSigLIP that demonstrate how to create instances for inference and fine-tuning. When ready to scale, MedGemma and MedSigLIP can be seamlessly deployed in Vertex AI as dedicated endpoints, with examples available on GitHub for running inference. Full details of the models can be found in the MedGemma technical report.
Note that these models are intended as a starting point for efficient development of downstream healthcare applications involving medical text and images. Developers should validate and adapt the models to their specific use cases to ensure the outputs meet their requirements.