Medical AI on Apple Silicon – MedGemma 1.5 4B via MLX
Project description
medgemma
Medical AI on Apple Silicon — MedGemma 1.5 4B via MLX
[!WARNING] Medical Disclaimer — This tool is for informational and educational purposes only. It is NOT a substitute for professional medical advice, diagnosis, or treatment. Always consult a qualified healthcare provider for medical decisions. Never disregard professional medical advice because of something generated by this tool.
What is MedGemma?
MedGemma is a command-line tool and Python library that runs Google's MedGemma 1.5 4B medical AI model locally on your Mac. It uses Apple's MLX framework to run entirely on your Apple Silicon GPU — no cloud API, no data leaves your machine. Ask medical questions, analyze medical images, and get evidence-based responses, all from your terminal.
Requirements
- Apple Silicon Mac (M1, M2, M3, or M4)
- Python 3.10 or newer
- ~4 GB disk space for the quantized model weights
- macOS (the only supported platform)
Quick Start
1. Install
pip install medgemma
Or with uv:
uv pip install medgemma
2. Hugging Face authentication
The model weights are hosted on Hugging Face under google/medgemma-4b-it. Before downloading, you need to:
- Create a Hugging Face account (free)
- Visit the model page and accept Google's license agreement
- Log in locally:
pip install huggingface-hub
huggingface-cli login
You only need to do this once.
3. Download the model
medgemma setup
This downloads the MedGemma 4B model from Hugging Face, converts it to 4-bit quantized MLX format, and caches it at ~/.medgemma/model. You only need to do this once.
4. Ask a question
medgemma ask "What are the common symptoms of type 2 diabetes?"
Example output:
The common symptoms of type 2 diabetes include:
* **Increased thirst (polydipsia):** You may feel thirsty more often than usual.
* **Frequent urination (polyuria):** You may need to urinate more often,
especially at night.
* **Increased hunger (polyphagia):** You may feel hungry even after eating.
* **Unexplained weight loss:** You may lose weight without trying.
* **Fatigue:** You may feel tired and lacking energy.
* **Blurred vision:** High blood sugar can affect the lenses of your eyes.
* **Slow-healing sores or frequent infections:** High blood sugar can impair
your body's ability to heal.
* **Numbness or tingling in hands or feet:** This can be a sign of nerve
damage (neuropathy).
* **Areas of darkened skin:** Particularly in the armpits and neck
(acanthosis nigricans).
It is important to note that many people with type 2 diabetes may not experience
any symptoms in the early stages. Regular check-ups and blood sugar screenings
are recommended, especially if you have risk factors.
**Disclaimer:** I am an AI assistant and cannot provide medical advice. Please
consult a healthcare professional for diagnosis and treatment.
Image Analysis
Analyze medical images by passing --image:
medgemma ask "Describe this chest X-ray" --image chest_xray.png
Example output:
The chest X-ray shows the following findings:
* **Heart size:** The heart appears to be within normal limits in size.
* **Lungs:** The lung fields appear clear, without any obvious consolidation,
effusion, or pneumothorax.
* **Mediastinum:** The mediastinal contours appear normal.
* **Bones:** No acute bony abnormalities are identified.
**Overall impression:** The chest X-ray appears unremarkable, with no acute
cardiopulmonary abnormality identified.
**Disclaimer:** I am an AI and this is not a radiological report. Please
consult a qualified radiologist for proper interpretation.
CLI Reference
medgemma ask
Send a prompt (and optional image) to the model.
medgemma ask PROMPT [OPTIONS]
| Option | Description |
|---|---|
--image PATH |
Path to an image file to analyze |
--max-tokens INT |
Maximum tokens to generate (default: 512) |
--temperature FLOAT |
Sampling temperature (default: 0.1) |
--model-path PATH |
Path to a local MLX model directory |
--json |
Output full response as JSON with stats |
--no-stream |
Disable streaming, print all at once |
medgemma setup
Download and prepare the model.
medgemma setup [OPTIONS]
| Option | Description |
|---|---|
--local-path PATH |
Use an already-converted local model instead of downloading |
--force |
Re-download and overwrite existing cached model |
medgemma info
Show model status and cache location.
medgemma info
Example output:
Cache directory: /Users/you/.medgemma/model
Model in cache: yes
Model loaded: no
medgemma --version
Print the installed version.
Python API
Basic usage
from medgemma import MedGemma
mg = MedGemma()
response = mg.ask("What are symptoms of diabetes?")
print(response.text)
Image analysis
response = mg.ask("Describe this X-ray", image="chest_xray.png")
print(response.text)
Streaming
for chunk in mg.stream("Explain hypertension"):
print(chunk, end="", flush=True)
Response object
MedGemma.ask() returns a Response dataclass with these fields:
| Field | Type | Description |
|---|---|---|
text |
str |
The generated response text |
prompt_tokens |
int |
Number of tokens in the prompt |
completion_tokens |
int |
Number of tokens generated |
tokens_per_second |
float |
Generation speed |
elapsed_seconds |
float |
Total generation time |
response = mg.ask("What is aspirin used for?")
print(response.text)
print(f"{response.completion_tokens} tokens in {response.elapsed_seconds:.1f}s")
print(f"Speed: {response.tokens_per_second:.1f} tok/s")
Custom model path and parameters
mg = MedGemma(
model_path="/path/to/local/mlx-model",
max_tokens=1024,
temperature=0.3,
)
Release model from memory
mg.unload()
JSON Output
Use --json to get structured output with generation stats:
medgemma ask "What is hypertension?" --json
{
"text": "Hypertension, also known as high blood pressure, is a chronic medical condition...",
"completion_tokens": 248,
"tokens_per_second": 32.5,
"elapsed_seconds": 7.6
}
How It Works
- Model download —
medgemma setupdownloads Google's MedGemma 1.5 4B-IT from Hugging Face. - Quantization — The model is converted to 4-bit quantized MLX format, reducing size from ~8 GB to ~4 GB while preserving quality.
- Local inference — All inference runs on your Apple Silicon GPU via the MLX framework. No data is sent to any server.
- Lazy loading — The model loads into memory only on the first
ask()orstream()call, and stays loaded for subsequent requests.
Troubleshooting
"Not running on Apple Silicon"
MedGemma requires an Apple Silicon Mac (M1/M2/M3/M4). It cannot run on Intel Macs or other platforms. The MLX framework only supports Apple's ARM-based chips.
Model download fails
- Make sure you've accepted the license at google/medgemma-4b-it and logged in with
huggingface-cli login - Check your internet connection
- Ensure you have ~4 GB of free disk space
- Try again with
medgemma setup --force - If you're behind a firewall, download the model manually and use
medgemma setup --local-path /path/to/model
Out of memory
The 4-bit quantized model needs approximately 4 GB of unified memory. If you're running low:
- Close other memory-intensive applications
- Use
--max-tokenswith a lower value to limit output length - Call
mg.unload()in Python when you're done to free memory
Model loads slowly on first run
The first ask call loads the model into GPU memory, which can take several seconds. Subsequent calls reuse the loaded model and are much faster.
[!WARNING] Medical Disclaimer — This tool is for informational and educational purposes only. It does not provide medical advice, diagnosis, or treatment. The outputs are generated by an AI model and may be inaccurate or incomplete. Always seek the advice of a qualified healthcare provider with any questions regarding a medical condition.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file medgemma-0.1.1.tar.gz.
File metadata
- Download URL: medgemma-0.1.1.tar.gz
- Upload date:
- Size: 9.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8b7e28dbba00c889dbabec82efb329c4f853b39f299caaea6c63444f5941c566
|
|
| MD5 |
408d8016379a42016145a71f9d9f3ce4
|
|
| BLAKE2b-256 |
84967c61c98787edffbb0e97be57f2bb93ecd20e9adcb984e3d06762a3f74833
|
File details
Details for the file medgemma-0.1.1-py3-none-any.whl.
File metadata
- Download URL: medgemma-0.1.1-py3-none-any.whl
- Upload date:
- Size: 11.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
da1ba7ec13d65d90eb0292a4ebb0403d4d52bbdf206832a46d053fc011efa7ad
|
|
| MD5 |
d23bf7698a37ebf87e4debf7fffbc199
|
|
| BLAKE2b-256 |
f3fba419f45de2cf3d82e9377e01c0e591a33b062dfb276f0e52908b4232c213
|