A CLI tool to verify and fix AI/ML environment compatibility (Driver <-> CUDA <-> Wheels).
Project description
🩺 Env-Doctor
The missing link between your GPU and Python AI libraries
"Why does my PyTorch crash with CUDA errors when I just installed it?"
Because your driver supports CUDA 11.8, but
pip install torchgave you CUDA 12.4 wheels.
Env-Doctor diagnoses and fixes the #1 frustration in GPU computing: mismatched CUDA versions between your NVIDIA driver, system toolkit, cuDNN, and Python libraries.
It takes 5 seconds to find out if your environment is broken — and exactly how to fix it.
🚀 Features
| Feature | What It Does |
|---|---|
| ⚡ One-Command Diagnosis | Instantly check compatibility between GPU Driver → CUDA Toolkit → cuDNN → PyTorch/TensorFlow/JAX |
| 🔧 Deep CUDA Analysis | cuda-info reveals multiple installations, PATH issues, environment misconfigurations |
| 🧠 cuDNN Detection | cudnn-info finds cuDNN libraries, validates symlinks, checks version compatibility |
| 🐳 Container Validation | dockerfile & docker-compose commands catch GPU config errors with DB-driven recommendations before you build/deploy |
| 🤖 AI Model Compatibility | Check if your GPU can run any model (LLMs, Diffusion, Audio) before downloading |
| 🐧 WSL2 GPU Support | Detects WSL1/WSL2 environments, validates GPU forwarding, catches common driver conflicts |
| 🛠️ Compilation Guard | Warns if system nvcc doesn't match PyTorch's CUDA — preventing flash-attention build failures |
| 💊 Safe Install Commands | Prescribes the exact pip install command that works with YOUR driver |
| 🦜 Migration Helper | Scans code for deprecated imports (LangChain, Pydantic) and suggests fixes |
📦 Installation
⏳ Coming Soon:
pip install env-doctorwill be available shortly!
For now, install from source:
git clone https://github.com/mitulgarg/env-doctor.git
cd env-doctor
pip install -e .
🛠️ Usage
1️⃣ Diagnose Your Environment
Check your current system health, driver info, and installed library conflicts.
env-doctor check
What it checks:
- Environment: Native Linux vs WSL1 vs WSL2, with GPU forwarding validation for WSL2
- GPU Driver: Is it too old for your installed PyTorch?
- System CUDA: Is it missing or mismatched?
- Library Conflicts: Do you have a "Frankenstein" environment (e.g., Torch 2.1 with CUDA 12.1 vs Driver supporting only 11.8)?
- WSL2 GPU Setup: Validates CUDA libraries, checks for driver conflicts, tests nvidia-smi functionality
2️⃣ Get the Safe Install Command
Don't guess which index-url to use. Let the doctor prescribe it.
env-doctor install torch
Output Example:
⬇️ Run this command to install the SAFE version:
---------------------------------------------------
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
---------------------------------------------------
3️⃣ Scan for Code Issues
Scan your project for deprecated imports (like old LangChain definitions).
env-doctor scan
4️⃣ Deep CUDA Toolkit Analysis
Get comprehensive details about your CUDA installations, environment variables, and configuration issues.
env-doctor cuda-info
What it shows:
- Multiple CUDA installations and their paths
CUDA_HOME,PATH, andLD_LIBRARY_PATHconfiguration- Runtime library (libcudart) status
- Driver compatibility with installed toolkit
5️⃣ cuDNN Library Analysis
Detect cuDNN installations and validate they're properly configured.
env-doctor cudnn-info
What it shows:
- cuDNN version and library locations
- Multiple installation detection
- Symlink validation (Linux)
- PATH configuration (Windows)
- CUDA compatibility status
6️⃣ Validate Dockerfiles
Check your Dockerfile for GPU/CUDA configuration issues before building.
env-doctor dockerfile
What it validates:
- Base Images: Detects CPU-only images and provides DB-driven GPU base image + install command recommendations
- PyTorch Installs: Ensures
pip install torchhas the correct--index-urlusing verified install commands - Library Version Compatibility: Validates pinned versions against DB-verified combinations for your CUDA version
- Multi-Library Support: Checks that multiple GPU libraries (torch, tensorflow, jax) are compatible with the same CUDA version
- Runtime vs Devel Images: Detects compilation requirements (flash-attn, xformers) and enforces
-develbase images - Deprecated Packages: Flags deprecated packages like
tensorflow-gpuand suggests modern alternatives - Driver Installation: Flags forbidden NVIDIA driver installs (must be on host, not container)
- CUDA Toolkit: Warns about unnecessary toolkit installs that bloat images
Example Output:
🐳 DOCKERFILE VALIDATION: Dockerfile
❌ ERRORS (2):
------------------------------------------------------------
Line 1:
Issue: CPU-only base image detected: python:3.10
Fix: Use a GPU-enabled base image
Suggested fix:
FROM nvidia/cuda:12.1.0-runtime-ubuntu22.04
# Or: FROM pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
# Or: FROM tensorflow/tensorflow:latest-gpu
Line 8:
Issue: PyTorch installation missing --index-url flag
Fix: Add --index-url to install the correct CUDA version
SUMMARY:
❌ Errors: 2
⚠️ Warnings: 1
ℹ️ Info: 0
7️⃣ Validate Docker Compose Files
Check your docker-compose.yml for proper GPU device configuration.
env-doctor docker-compose
What it validates:
- GPU Device Config: Ensures
deploy.resources.reservations.devicesis set correctly - Driver Setting: Validates
driver: nvidiais specified - Capabilities: Checks for
capabilities: [gpu] - Deprecated Syntax: Flags old
runtime: nvidiaapproach - Multi-Service Conflicts: Warns about GPU resource sharing between services
- Host Requirements: Checks for nvidia-container-toolkit
Example Output:
🐳 DOCKER COMPOSE VALIDATION: docker-compose.yml
❌ ERRORS (1):
------------------------------------------------------------
Service 'ml-training':
Issue: Missing GPU device configuration
Fix: Add GPU device configuration under deploy.resources.reservations.devices
Suggested fix:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
⚠️ WARNINGS (1):
------------------------------------------------------------
Service 'legacy-app':
Issue: Deprecated 'runtime: nvidia' syntax
Fix: Use the new 'deploy.resources.reservations.devices' syntax instead
8️⃣ Check AI Model Compatibility
Before downloading a 40GB+ model, find out if it will run on your GPU!
env-doctor model llama-3-8b
What it checks:
- Model Parameters: LLMs, Diffusion models, Audio models, and more
- VRAM Requirements: Calculates VRAM needed for each precision (fp32, fp16, int8, int4)
- GPU Availability: Detects your GPU and available VRAM
- Compatibility Analysis: Shows which precisions fit on your hardware
- Smart Recommendations: Suggests smaller variants or multi-GPU setup if needed
List all available models:
env-doctor model --list
Check specific precision:
env-doctor model stable-diffusion-xl --precision int4
Example Output:
🤖 Checking: LLAMA-3-8B
Parameters: 8.0B
HuggingFace: meta-llama/Meta-Llama-3-8B
🖥️ Your Hardware:
RTX 3090 (24GB VRAM)
💾 VRAM Requirements & Compatibility
✅ FP16: 19.2GB (measured) - 4.8GB free
✅ INT4: 4.8GB (estimated) - 19.2GB free
✅ This model WILL FIT on your GPU!
💡 Recommendations:
1. Use fp16 for best quality on your GPU
Supported Models:
- LLMs: Llama-3, Mistral, Mixtral, Qwen (8B-405B parameters)
- Diffusion: Stable Diffusion 1.5/XL, Flux, Stable Diffusion 3
- Audio: Whisper (tiny to large-v3)
- Language: BERT, T5 (for embeddings and text encoding)
Key Features:
- Measured VRAM for popular models (most accurate)
- Formula-based estimation for new models
- Multi-GPU support (total VRAM calculation)
- Alias support (e.g., "sdxl" → "stable-diffusion-xl")
- Family variants (e.g., suggest llama-3-8b when 70b won't fit)
9️⃣ Debug Mode (Troubleshooting)
Get detailed information from all detectors for troubleshooting and development.
env-doctor debug
What debug mode shows:
- All Detector Results: Raw output from every registered detector
- Detection Metadata: Internal detection methods, paths, and detailed status
- Registry Information: List of all available detectors
- Error Details: Full exception traces and diagnostic information
Example Output:
🔍 DEBUG MODE - Detailed Detector Information
============================================================
Registered Detectors: cuda_toolkit, nvidia_driver, python_library, wsl2
--- WSL2 ---
Status: Status.SUCCESS
Component: wsl2
Version: wsl2
Metadata: {'environment': 'WSL2', 'gpu_forwarding': 'enabled'}
--- NVIDIA DRIVER ---
Status: Status.SUCCESS
Component: nvidia_driver
Version: 535.146.02
Path: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
Metadata: {'max_cuda_version': '12.2', 'detection_method': 'nvml'}
Use debug mode when:
- Environment diagnosis shows unexpected results
- You want to understand what each detector found
- Contributing to the project or reporting issues
- Validating detector behavior in different environments
🐧 WSL2 GPU Support
Env-Doctor provides comprehensive WSL2 environment detection and GPU forwarding validation:
Environment Detection:
- Native Linux: Standard Linux environment detection
- WSL1: Detects WSL1 and warns that CUDA is not supported at all
- WSL2: Full GPU forwarding validation and troubleshooting
WSL2 GPU Validation:
- ✅ Driver Conflicts: Detects problematic internal NVIDIA drivers that break GPU forwarding
- ✅ CUDA Libraries: Validates presence of
/usr/lib/wsl/lib/libcuda.so - ✅ nvidia-smi: Tests functionality and provides specific error guidance
- ✅ Recommendations: Provides actionable steps to fix GPU forwarding issues
Common WSL2 Issues Detected:
❌ NVIDIA driver installed inside WSL. This breaks GPU forwarding.
→ Run: sudo apt remove --purge nvidia-*
❌ Missing /usr/lib/wsl/lib/libcuda.so
→ Reinstall NVIDIA driver on Windows host
❌ nvidia-smi command failed
→ Install NVIDIA driver on Windows (version 470.76 or newer)
🤖 JSON Output & CI/CD Integration
NEW: All core commands now support machine-readable JSON output for automation and CI/CD pipelines!
JSON Output
Add --json flag to get structured, parseable output:
# Get JSON output
env-doctor check --json
# Example output
{
"status": "warning",
"timestamp": "2026-01-15T10:30:00Z",
"summary": {
"driver": "found",
"cuda": "found",
"issues_count": 2
},
"checks": {
"driver": {
"component": "nvidia_driver",
"status": "success",
"detected": true,
"version": "536.40",
"metadata": {"max_cuda_version": "12.2"}
},
"cuda": {...},
"libraries": {...}
}
}
CI/CD Mode
Use --ci flag for CI/CD pipelines (implies --json with proper exit codes):
env-doctor check --ci
# Exit codes:
# 0 = All checks passed
# 1 = Warnings or non-critical issues
# 2 = Critical errors detected
GitHub Actions Integration
name: Validate Environment
on: [push, pull_request]
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- run: pip install env-doctor
- run: env-doctor check --ci
See full example at examples/github-actions/validate-env.yml
Commands Supporting JSON
env-doctor check --jsonor--cienv-doctor cuda-info --jsonenv-doctor cudnn-info --jsonenv-doctor scan --json
Use Cases
Parse results in scripts:
# Extract CUDA version
CUDA_VERSION=$(env-doctor check --json | jq -r '.checks.cuda.version')
# Conditional installation
if env-doctor check --json | jq -e '.checks.driver.detected'; then
pip install torch # GPU version
else
pip install torch --index-url https://download.pytorch.org/whl/cpu
fi
Store results for monitoring:
import json
import subprocess
result = subprocess.run(["env-doctor", "check", "--json"], capture_output=True, text=True)
data = json.loads(result.stdout)
# Store in database, send to monitoring system, etc.
db.insert({"timestamp": data["timestamp"], "status": data["status"], ...})
📋 Quick Command Reference
env-doctor check # Diagnose your environment
env-doctor check --json # Get JSON output
env-doctor check --ci # CI/CD mode with exit codes
env-doctor cuda-info # Detailed CUDA toolkit analysis
env-doctor cudnn-info # Detailed cuDNN library analysis
env-doctor dockerfile # Validate Dockerfile for GPU issues
env-doctor docker-compose # Validate docker-compose.yml for GPU issues
env-doctor model llama-3-8b # Check if model fits on your GPU
env-doctor model --list # List all available models
env-doctor install torch # Get safe install command for PyTorch
env-doctor scan # Scan project for AI library imports
env-doctor debug # Show detailed detector information
🧩 Architecture
- The Brain:
compatibility.jsonmaps drivers to max supported CUDA versions and verified wheel URLs. - The Detectors: Modular detection system with specialized detectors for:
WSL2Detector: Environment detection and GPU forwarding validationNvidiaDriverDetector: GPU driver version and capability detectionCudaToolkitDetector: System CUDA installation detectionCudnnDetector: cuDNN library detection and configuration validationPythonLibraryDetector: Python AI library version and CUDA compatibility
- The Registry:
DetectorRegistryprovides a plugin system for easy detector discovery and execution. - The CLI:
cli.pyorchestrates all detectors and presents unified diagnostics. - The Updater:
db.pyfetches the latest rules from GitHub so you don't need to update the package daily.
🔄 Automated Database Updates
Env-Doctor maintains an up-to-date compatibility database through an automated scraping and validation system, designed for future ease of maintainability while preserving human oversight.
How It Works
-
Automated Scraping (
tools/scraper.py)- GitHub Actions workflow runs periodically to scrape official PyTorch/TensorFlow/JAX documentation
- Extracts latest CUDA compatibility mappings and verified wheel URLs
- Updates
compatibility.jsonwith new versions and URLs
-
Validation Layer (
tools/validator.py) #BETA-Not-Implemented- Automatically validates the scraped data structure with cloud GPUs before committing
- Ensures version strings are parseable and URLs are well-formed
- Catches malformed entries that could break the tool
-
Human Verification via PR Merge
- Automated updates create pull requests (not auto-merged)
- Maintainers review changes before merging to ensure quality
- Community members can flag issues or suggest corrections
- Provides transparency and accountability for database changes
Community-Driven Contributions Preferred
While automation handles routine updates, community contributions are highly valued for:
- Edge Case Detection: Real-world users catching compatibility issues the scraper misses
- Platform-Specific Issues: WSL2, conda environments, or unusual driver configurations
- New Library Support: Adding new AI frameworks or tools
- Verification: Testing that recommended install commands actually work
This hybrid approach combines automation for maintainability with community oversight for accuracy.
Running the Tools Locally
# Scrape latest compatibility data
python tools/scraper.py
# Validate the database structure
python tools/validator.py
# Both tools are also run automatically by GitHub Actions (.github/workflows/update_db.yml)
🤝 Contributing
We love contributions! Please read our CONTRIBUTING.md for details on how to submit pull requests and our development setup.
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file env_doctor-0.1.0.tar.gz.
File metadata
- Download URL: env_doctor-0.1.0.tar.gz
- Upload date:
- Size: 66.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d28385908993ec6be138edbe90a4a49c4ecfa59886578ef6740c66fc7fb64c10
|
|
| MD5 |
1e01cffb770ca3934b9926495a0442cf
|
|
| BLAKE2b-256 |
6f00c9e59d8a23370bbcf006a1d3c02f183dc9b4f5dedf6745525285e5f45cdd
|
File details
Details for the file env_doctor-0.1.0-py3-none-any.whl.
File metadata
- Download URL: env_doctor-0.1.0-py3-none-any.whl
- Upload date:
- Size: 67.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
57c4c5f8cff01f01fae784d8a15ec77dc9134dbf572a563a53abca47048d0b9b
|
|
| MD5 |
58b47a5b7974651b3580517c8509b6fd
|
|
| BLAKE2b-256 |
89b25d16f0ffccfc841dfd1fe590a7994c7b294a801f88663e3522eea9f17ad1
|