Welcome to ViTAL

AI-powered Sentence Completion Tool for Chest X-Ray Reporting

Real Time Autocomplete

Powered by advanced vision-language model to provide autocomplete suggestions on Chest X-Ray report writing

Saliency Map Visualization

Highlight areas in the Chest X-Ray image corresponding to the provided autocomplete suggestions

Save Reporting Time

Save reporting time and give radiologists back their time to spend on key diagnosis and patient care

Chest X-Ray Expert

Model trained on >81,000 Chest X-Ray image and report pairs

Our Mission

Provide radiologists with a supplementary tool that jumpstarts their report generation by annotating chest x-ray images and highlighting key observations of the radiology study using advanced Vision Text Informed Autocomplete Language (ViTAL) model.

Transform how radiologists work

Too many reports to be generated, not enough radiologists

There are more than 70,000,000 Chest X-rays performed each year in the U.S.

Lead to long turnaround time

The U.S. National average cost for Chest X-ray is $420

Long turnaround time attributes to higher cost

An AI system that provides X-ray informed sentence completion suggestion as radiologists type their reports

Real time autocompletion

ViTAL provides autocompletion based on the patient's chest X-Ray images

Take account into patient's medical history

ViTAL takes current and historical Chest X-ray image and doctor's annotation

Our team fine-tuned pretrained Vision Transformer Model and Large Language Model to provide annotation on Chest X-Ray images

Swin Vision Transformer pretrained on ImageNet dataset + GPT2 architecture

Such architecutre allows Swin ViT model to produce embeddings based on Chest X-ray images, along with text embeddings based on radiologist input, are fed to a Large Language Model (GPT2) to enable annotation on Chest X-Ray images

Pretrain ViTAL model with MS COCO dataset, then fine-tuned on MIMIC-IV Chest X-Ray images

Use large image caption MS COCO dataset to pretrain the ViTAL model such that the ViT and LLM talks in the same language; then fine-tunes the ViTAL model with MIMIC-IV Chest X-ray images and ground truth radiologist annotations to produce high quality reports based on Chest X-ray images

ViTAL In Action

Watch a 5 min demo on how to use ViTAL to assist composing a Chest X-ray report

Try it out yourself


Vision Text Informed Autocomplete Language (ViTAL) model links pretrained Swin Vision Transformer and GPT-2 LLM
to enable vision + language to language model to generate a descriptive narrative of X-ray findings based on given Chest X-ray images.

Learn More about ViTAL

Our Team

We are researchers from the UC Berkeley Master of Information and Data Science (MIDS) program

who aim to leverage advanced AI models to solve real-world problems