Accredited radiologic technology programs in california
Read More
Today we present two recent research papers in which we explore the possibilities of Gemini in the healthcare space and introduce Med-Gemini, a new family of next-generation models fine-tuned for the medical domain. This family of models builds upon Google’s Gemini models by fine-tuning on de-identified medical data while inheriting Gemini’s native reasoning, multimodal, and long-context abilities.
For AI models to perform well on diverse medical tasks and to meaningfully assist in clinician, researcher and patient workflows (like generating radiology reports or summarizing health information), they often require advanced reasoning and the ability to utilize specialized, up-to-date medical knowledge. In addition, strong performance requires models to move beyond short passages of text to understand complex multimodal data, including images, videos, and the extensive length and breadth of electronic health records (EHRs).
With this in mind, Gemini models have demonstrated a leap forward in multimodal and long-context reasoning, which presents substantial potential in medicine. Med-Gemini builds on our initial research into medically tuned large language models with Med-PaLM.
The first paper, “Capabilities of Gemini Models in Medicine”, describes a broad exploration of Gemini’s capabilities across a wide range of text, image, video, and EHR tasks. We benchmark the new Med-Gemini models on 14 tasks spanning text, multimodal and long-context applications, and demonstrate strong results.
In the popular MedQA US Medical Licensing Exam (USMLE)-style question benchmark, Med-Gemini achieves a state-of-the-art performance of 91.1% accuracy, surpassing our prior best of Med-PaLM 2 by 4.6%. On multimodal benchmarks such as NEJM Image Challenges and multimodal USMLE-style questions, Med-Gemini achieves a new state of the art, surpassing GPT-4 V by a wide margin.
| Benchmark Category | Med-Gemini Result | Comparative Improvement |
|---|---|---|
| MedQA (USMLE-style) | 91.1% Accuracy | +4.6% over Med-PaLM 2 |
| NEJM Image Challenges | New State-of-the-Art | Surpasses GPT-4 V |
| Diagnostic Challenges | State-of-the-Art | Complex NEJM conferences |
In the second paper, “Advancing Multimodal Medical Capabilities of Gemini”, we offer a deeper dive into Med-Gemini’s multimodal capabilities through application to radiology, pathology, dermatology, ophthalmology, and genomics in healthcare. For the first time, we demonstrate how large multimodal models can interpret complex 3D scans, answer clinical questions, and generate state-of-the-art radiology reports.
Additionally, we demonstrate a novel mechanism to encode genomic information for risk prediction using large language models across a wealth of disease areas with strong results.
We enhance our models' clinical reasoning through self-training and web search integration, improve multimodal performance via fine-tuning and customized encoders, and better utilize long-context capabilities with chain-of-reasoning prompting. Key features include: