Skip to main content

Medical AI on Apple Silicon – MedGemma 1.5 4B via MLX

Project description

medgemma

Medical AI on Apple Silicon — MedGemma 1.5 4B via MLX

PyPI version Python 3.10+ Apple Silicon License: MIT

[!WARNING] Medical Disclaimer — This tool is for informational and educational purposes only. It is NOT a substitute for professional medical advice, diagnosis, or treatment. Always consult a qualified healthcare provider for medical decisions. Never disregard professional medical advice because of something generated by this tool.

What is MedGemma?

MedGemma is a command-line tool and Python library that runs Google's MedGemma 1.5 4B medical AI model locally on your Mac. It uses Apple's MLX framework to run entirely on your Apple Silicon GPU — no cloud API, no data leaves your machine. Ask medical questions, analyze medical images, and get evidence-based responses, all from your terminal.

Requirements

  • Apple Silicon Mac (M1, M2, M3, or M4)
  • Python 3.10 or newer
  • ~4 GB disk space for the quantized model weights
  • macOS (the only supported platform)

Quick Start

1. Install

pip install medgemma

Or with uv:

uv pip install medgemma

2. Hugging Face authentication

The model weights are hosted on Hugging Face under google/medgemma-4b-it. Before downloading, you need to:

  1. Create a Hugging Face account (free)
  2. Visit the model page and accept Google's license agreement
  3. Log in locally:
pip install huggingface-hub
huggingface-cli login

You only need to do this once.

3. Download the model

medgemma setup

This downloads the MedGemma 4B model from Hugging Face, converts it to 4-bit quantized MLX format, and caches it at ~/.medgemma/model. You only need to do this once.

4. Ask a question

medgemma ask "What are the common symptoms of type 2 diabetes?"

Example output:

The common symptoms of type 2 diabetes include:

*   **Increased thirst (polydipsia):** You may feel thirsty more often than usual.
*   **Frequent urination (polyuria):** You may need to urinate more often,
    especially at night.
*   **Increased hunger (polyphagia):** You may feel hungry even after eating.
*   **Unexplained weight loss:** You may lose weight without trying.
*   **Fatigue:** You may feel tired and lacking energy.
*   **Blurred vision:** High blood sugar can affect the lenses of your eyes.
*   **Slow-healing sores or frequent infections:** High blood sugar can impair
    your body's ability to heal.
*   **Numbness or tingling in hands or feet:** This can be a sign of nerve
    damage (neuropathy).
*   **Areas of darkened skin:** Particularly in the armpits and neck
    (acanthosis nigricans).

It is important to note that many people with type 2 diabetes may not experience
any symptoms in the early stages. Regular check-ups and blood sugar screenings
are recommended, especially if you have risk factors.

**Disclaimer:** I am an AI assistant and cannot provide medical advice. Please
consult a healthcare professional for diagnosis and treatment.

Image Analysis

Analyze medical images by passing --image:

medgemma ask "Describe this chest X-ray" --image chest_xray.png

Example output:

The chest X-ray shows the following findings:

*   **Heart size:** The heart appears to be within normal limits in size.
*   **Lungs:** The lung fields appear clear, without any obvious consolidation,
    effusion, or pneumothorax.
*   **Mediastinum:** The mediastinal contours appear normal.
*   **Bones:** No acute bony abnormalities are identified.

**Overall impression:** The chest X-ray appears unremarkable, with no acute
cardiopulmonary abnormality identified.

**Disclaimer:** I am an AI and this is not a radiological report. Please
consult a qualified radiologist for proper interpretation.

CLI Reference

medgemma ask

Send a prompt (and optional image) to the model.

medgemma ask PROMPT [OPTIONS]
Option Description
--image PATH Path to an image file to analyze
--max-tokens INT Maximum tokens to generate (default: 512)
--temperature FLOAT Sampling temperature (default: 0.1)
--model-path PATH Path to a local MLX model directory
--json Output full response as JSON with stats
--no-stream Disable streaming, print all at once

medgemma setup

Download and prepare the model.

medgemma setup [OPTIONS]
Option Description
--local-path PATH Use an already-converted local model instead of downloading
--force Re-download and overwrite existing cached model

medgemma info

Show model status and cache location.

medgemma info

Example output:

Cache directory: /Users/you/.medgemma/model
Model in cache:  yes
Model loaded:    no

medgemma --version

Print the installed version.

Python API

Basic usage

from medgemma import MedGemma

mg = MedGemma()
response = mg.ask("What are symptoms of diabetes?")
print(response.text)

Image analysis

response = mg.ask("Describe this X-ray", image="chest_xray.png")
print(response.text)

Streaming

for chunk in mg.stream("Explain hypertension"):
    print(chunk, end="", flush=True)

Response object

MedGemma.ask() returns a Response dataclass with these fields:

Field Type Description
text str The generated response text
prompt_tokens int Number of tokens in the prompt
completion_tokens int Number of tokens generated
tokens_per_second float Generation speed
elapsed_seconds float Total generation time
response = mg.ask("What is aspirin used for?")
print(response.text)
print(f"{response.completion_tokens} tokens in {response.elapsed_seconds:.1f}s")
print(f"Speed: {response.tokens_per_second:.1f} tok/s")

Custom model path and parameters

mg = MedGemma(
    model_path="/path/to/local/mlx-model",
    max_tokens=1024,
    temperature=0.3,
)

Release model from memory

mg.unload()

JSON Output

Use --json to get structured output with generation stats:

medgemma ask "What is hypertension?" --json
{
  "text": "Hypertension, also known as high blood pressure, is a chronic medical condition...",
  "completion_tokens": 248,
  "tokens_per_second": 32.5,
  "elapsed_seconds": 7.6
}

How It Works

  1. Model downloadmedgemma setup downloads Google's MedGemma 1.5 4B-IT from Hugging Face.
  2. Quantization — The model is converted to 4-bit quantized MLX format, reducing size from ~8 GB to ~4 GB while preserving quality.
  3. Local inference — All inference runs on your Apple Silicon GPU via the MLX framework. No data is sent to any server.
  4. Lazy loading — The model loads into memory only on the first ask() or stream() call, and stays loaded for subsequent requests.

Troubleshooting

"Not running on Apple Silicon"

MedGemma requires an Apple Silicon Mac (M1/M2/M3/M4). It cannot run on Intel Macs or other platforms. The MLX framework only supports Apple's ARM-based chips.

Model download fails

  • Make sure you've accepted the license at google/medgemma-4b-it and logged in with huggingface-cli login
  • Check your internet connection
  • Ensure you have ~4 GB of free disk space
  • Try again with medgemma setup --force
  • If you're behind a firewall, download the model manually and use medgemma setup --local-path /path/to/model

Out of memory

The 4-bit quantized model needs approximately 4 GB of unified memory. If you're running low:

  • Close other memory-intensive applications
  • Use --max-tokens with a lower value to limit output length
  • Call mg.unload() in Python when you're done to free memory

Model loads slowly on first run

The first ask call loads the model into GPU memory, which can take several seconds. Subsequent calls reuse the loaded model and are much faster.


[!WARNING] Medical Disclaimer — This tool is for informational and educational purposes only. It does not provide medical advice, diagnosis, or treatment. The outputs are generated by an AI model and may be inaccurate or incomplete. Always seek the advice of a qualified healthcare provider with any questions regarding a medical condition.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

medgemma-0.1.1.tar.gz (9.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

medgemma-0.1.1-py3-none-any.whl (11.8 kB view details)

Uploaded Python 3

File details

Details for the file medgemma-0.1.1.tar.gz.

File metadata

  • Download URL: medgemma-0.1.1.tar.gz
  • Upload date:
  • Size: 9.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for medgemma-0.1.1.tar.gz
Algorithm Hash digest
SHA256 8b7e28dbba00c889dbabec82efb329c4f853b39f299caaea6c63444f5941c566
MD5 408d8016379a42016145a71f9d9f3ce4
BLAKE2b-256 84967c61c98787edffbb0e97be57f2bb93ecd20e9adcb984e3d06762a3f74833

See more details on using hashes here.

File details

Details for the file medgemma-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: medgemma-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 11.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for medgemma-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 da1ba7ec13d65d90eb0292a4ebb0403d4d52bbdf206832a46d053fc011efa7ad
MD5 d23bf7698a37ebf87e4debf7fffbc199
BLAKE2b-256 f3fba419f45de2cf3d82e9377e01c0e591a33b062dfb276f0e52908b4232c213

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page