Skip to main content

A CLI tool to verify and fix AI/ML environment compatibility (Driver <-> CUDA <-> Wheels).

Project description

🩺 Env-Doctor

The missing link between your GPU and Python AI libraries

License CI Status Python


"Why does my PyTorch crash with CUDA errors when I just installed it?"

Because your driver supports CUDA 11.8, but pip install torch gave you CUDA 12.4 wheels.


Env-Doctor diagnoses and fixes the #1 frustration in GPU computing: mismatched CUDA versions between your NVIDIA driver, system toolkit, cuDNN, and Python libraries.

It takes 5 seconds to find out if your environment is broken — and exactly how to fix it.

🚀 Features

Feature What It Does
⚡ One-Command Diagnosis Instantly check compatibility between GPU Driver → CUDA Toolkit → cuDNN → PyTorch/TensorFlow/JAX
🔧 Deep CUDA Analysis cuda-info reveals multiple installations, PATH issues, environment misconfigurations
🧠 cuDNN Detection cudnn-info finds cuDNN libraries, validates symlinks, checks version compatibility
🐳 Container Validation dockerfile & docker-compose commands catch GPU config errors with DB-driven recommendations before you build/deploy
🤖 AI Model Compatibility Check if your GPU can run any model (LLMs, Diffusion, Audio) before downloading
🐧 WSL2 GPU Support Detects WSL1/WSL2 environments, validates GPU forwarding, catches common driver conflicts
🛠️ Compilation Guard Warns if system nvcc doesn't match PyTorch's CUDA — preventing flash-attention build failures
💊 Safe Install Commands Prescribes the exact pip install command that works with YOUR driver
🦜 Migration Helper Scans code for deprecated imports (LangChain, Pydantic) and suggests fixes

📦 Installation

Coming Soon: pip install env-doctor will be available shortly!

For now, install from source:

git clone https://github.com/mitulgarg/env-doctor.git
cd env-doctor
pip install -e .

🛠️ Usage

1️⃣ Diagnose Your Environment

Check your current system health, driver info, and installed library conflicts.

env-doctor check

What it checks:

  • Environment: Native Linux vs WSL1 vs WSL2, with GPU forwarding validation for WSL2
  • GPU Driver: Is it too old for your installed PyTorch?
  • System CUDA: Is it missing or mismatched?
  • Library Conflicts: Do you have a "Frankenstein" environment (e.g., Torch 2.1 with CUDA 12.1 vs Driver supporting only 11.8)?
  • WSL2 GPU Setup: Validates CUDA libraries, checks for driver conflicts, tests nvidia-smi functionality

2️⃣ Get the Safe Install Command

Don't guess which index-url to use. Let the doctor prescribe it.

env-doctor install torch

Output Example:

⬇️ Run this command to install the SAFE version:
---------------------------------------------------
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
---------------------------------------------------

3️⃣ Scan for Code Issues

Scan your project for deprecated imports (like old LangChain definitions).

env-doctor scan

4️⃣ Deep CUDA Toolkit Analysis

Get comprehensive details about your CUDA installations, environment variables, and configuration issues.

env-doctor cuda-info

What it shows:

  • Multiple CUDA installations and their paths
  • CUDA_HOME, PATH, and LD_LIBRARY_PATH configuration
  • Runtime library (libcudart) status
  • Driver compatibility with installed toolkit

5️⃣ cuDNN Library Analysis

Detect cuDNN installations and validate they're properly configured.

env-doctor cudnn-info

What it shows:

  • cuDNN version and library locations
  • Multiple installation detection
  • Symlink validation (Linux)
  • PATH configuration (Windows)
  • CUDA compatibility status

6️⃣ Validate Dockerfiles

Check your Dockerfile for GPU/CUDA configuration issues before building.

env-doctor dockerfile

What it validates:

  • Base Images: Detects CPU-only images and provides DB-driven GPU base image + install command recommendations
  • PyTorch Installs: Ensures pip install torch has the correct --index-url using verified install commands
  • Library Version Compatibility: Validates pinned versions against DB-verified combinations for your CUDA version
  • Multi-Library Support: Checks that multiple GPU libraries (torch, tensorflow, jax) are compatible with the same CUDA version
  • Runtime vs Devel Images: Detects compilation requirements (flash-attn, xformers) and enforces -devel base images
  • Deprecated Packages: Flags deprecated packages like tensorflow-gpu and suggests modern alternatives
  • Driver Installation: Flags forbidden NVIDIA driver installs (must be on host, not container)
  • CUDA Toolkit: Warns about unnecessary toolkit installs that bloat images

Example Output:

🐳  DOCKERFILE VALIDATION: Dockerfile

❌  ERRORS (2):
------------------------------------------------------------

Line 1:
  Issue: CPU-only base image detected: python:3.10
  Fix:   Use a GPU-enabled base image

  Suggested fix:
    FROM nvidia/cuda:12.1.0-runtime-ubuntu22.04
    # Or: FROM pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
    # Or: FROM tensorflow/tensorflow:latest-gpu

Line 8:
  Issue: PyTorch installation missing --index-url flag
  Fix:   Add --index-url to install the correct CUDA version

SUMMARY:
   Errors:   2
  ⚠️  Warnings: 1
  ℹ️  Info:     0

7️⃣ Validate Docker Compose Files

Check your docker-compose.yml for proper GPU device configuration.

env-doctor docker-compose

What it validates:

  • GPU Device Config: Ensures deploy.resources.reservations.devices is set correctly
  • Driver Setting: Validates driver: nvidia is specified
  • Capabilities: Checks for capabilities: [gpu]
  • Deprecated Syntax: Flags old runtime: nvidia approach
  • Multi-Service Conflicts: Warns about GPU resource sharing between services
  • Host Requirements: Checks for nvidia-container-toolkit

Example Output:

🐳  DOCKER COMPOSE VALIDATION: docker-compose.yml

❌  ERRORS (1):
------------------------------------------------------------

Service 'ml-training':
  Issue: Missing GPU device configuration
  Fix:   Add GPU device configuration under deploy.resources.reservations.devices

  Suggested fix:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

⚠️   WARNINGS (1):
------------------------------------------------------------

Service 'legacy-app':
  Issue: Deprecated 'runtime: nvidia' syntax
  Fix:   Use the new 'deploy.resources.reservations.devices' syntax instead

8️⃣ Check AI Model Compatibility

Before downloading a 40GB+ model, find out if it will run on your GPU!

env-doctor model llama-3-8b

What it checks:

  • Model Parameters: LLMs, Diffusion models, Audio models, and more
  • VRAM Requirements: Calculates VRAM needed for each precision (fp32, fp16, int8, int4)
  • GPU Availability: Detects your GPU and available VRAM
  • Compatibility Analysis: Shows which precisions fit on your hardware
  • Smart Recommendations: Suggests smaller variants or multi-GPU setup if needed

List all available models:

env-doctor model --list

Check specific precision:

env-doctor model stable-diffusion-xl --precision int4

Example Output:

🤖  Checking: LLAMA-3-8B
    Parameters: 8.0B
    HuggingFace: meta-llama/Meta-Llama-3-8B

🖥️   Your Hardware:
    RTX 3090 (24GB VRAM)

💾  VRAM Requirements & Compatibility

  ✅  FP16: 19.2GB (measured) - 4.8GB free
  ✅  INT4:  4.8GB (estimated) - 19.2GB free

✅  This model WILL FIT on your GPU!

💡  Recommendations:
1. Use fp16 for best quality on your GPU

Supported Models:

  • LLMs: Llama-3, Mistral, Mixtral, Qwen (8B-405B parameters)
  • Diffusion: Stable Diffusion 1.5/XL, Flux, Stable Diffusion 3
  • Audio: Whisper (tiny to large-v3)
  • Language: BERT, T5 (for embeddings and text encoding)

Key Features:

  • Measured VRAM for popular models (most accurate)
  • Formula-based estimation for new models
  • Multi-GPU support (total VRAM calculation)
  • Alias support (e.g., "sdxl" → "stable-diffusion-xl")
  • Family variants (e.g., suggest llama-3-8b when 70b won't fit)

9️⃣ Debug Mode (Troubleshooting)

Get detailed information from all detectors for troubleshooting and development.

env-doctor debug

What debug mode shows:

  • All Detector Results: Raw output from every registered detector
  • Detection Metadata: Internal detection methods, paths, and detailed status
  • Registry Information: List of all available detectors
  • Error Details: Full exception traces and diagnostic information

Example Output:

🔍 DEBUG MODE - Detailed Detector Information
============================================================
Registered Detectors: cuda_toolkit, nvidia_driver, python_library, wsl2

--- WSL2 ---
Status: Status.SUCCESS
Component: wsl2
Version: wsl2
Metadata: {'environment': 'WSL2', 'gpu_forwarding': 'enabled'}

--- NVIDIA DRIVER ---
Status: Status.SUCCESS
Component: nvidia_driver
Version: 535.146.02
Path: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
Metadata: {'max_cuda_version': '12.2', 'detection_method': 'nvml'}

Use debug mode when:

  • Environment diagnosis shows unexpected results
  • You want to understand what each detector found
  • Contributing to the project or reporting issues
  • Validating detector behavior in different environments

🐧 WSL2 GPU Support

Env-Doctor provides comprehensive WSL2 environment detection and GPU forwarding validation:

Environment Detection:

  • Native Linux: Standard Linux environment detection
  • WSL1: Detects WSL1 and warns that CUDA is not supported at all
  • WSL2: Full GPU forwarding validation and troubleshooting

WSL2 GPU Validation:

  • Driver Conflicts: Detects problematic internal NVIDIA drivers that break GPU forwarding
  • CUDA Libraries: Validates presence of /usr/lib/wsl/lib/libcuda.so
  • nvidia-smi: Tests functionality and provides specific error guidance
  • Recommendations: Provides actionable steps to fix GPU forwarding issues

Common WSL2 Issues Detected:

 NVIDIA driver installed inside WSL. This breaks GPU forwarding.
    Run: sudo apt remove --purge nvidia-*

❌ Missing /usr/lib/wsl/lib/libcuda.so
    Reinstall NVIDIA driver on Windows host

❌ nvidia-smi command failed
    Install NVIDIA driver on Windows (version 470.76 or newer)

🤖 JSON Output & CI/CD Integration

NEW: All core commands now support machine-readable JSON output for automation and CI/CD pipelines!

JSON Output

Add --json flag to get structured, parseable output:

# Get JSON output
env-doctor check --json

# Example output
{
  "status": "warning",
  "timestamp": "2026-01-15T10:30:00Z",
  "summary": {
    "driver": "found",
    "cuda": "found",
    "issues_count": 2
  },
  "checks": {
    "driver": {
      "component": "nvidia_driver",
      "status": "success",
      "detected": true,
      "version": "536.40",
      "metadata": {"max_cuda_version": "12.2"}
    },
    "cuda": {...},
    "libraries": {...}
  }
}

CI/CD Mode

Use --ci flag for CI/CD pipelines (implies --json with proper exit codes):

env-doctor check --ci

# Exit codes:
# 0 = All checks passed
# 1 = Warnings or non-critical issues
# 2 = Critical errors detected

GitHub Actions Integration

name: Validate Environment
on: [push, pull_request]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - run: pip install env-doctor
      - run: env-doctor check --ci

See full example at examples/github-actions/validate-env.yml

Commands Supporting JSON

  • env-doctor check --json or --ci
  • env-doctor cuda-info --json
  • env-doctor cudnn-info --json
  • env-doctor scan --json

Use Cases

Parse results in scripts:

# Extract CUDA version
CUDA_VERSION=$(env-doctor check --json | jq -r '.checks.cuda.version')

# Conditional installation
if env-doctor check --json | jq -e '.checks.driver.detected'; then
  pip install torch  # GPU version
else
  pip install torch --index-url https://download.pytorch.org/whl/cpu
fi

Store results for monitoring:

import json
import subprocess

result = subprocess.run(["env-doctor", "check", "--json"], capture_output=True, text=True)
data = json.loads(result.stdout)

# Store in database, send to monitoring system, etc.
db.insert({"timestamp": data["timestamp"], "status": data["status"], ...})

📋 Quick Command Reference

env-doctor check              # Diagnose your environment
env-doctor check --json       # Get JSON output
env-doctor check --ci         # CI/CD mode with exit codes
env-doctor cuda-info          # Detailed CUDA toolkit analysis
env-doctor cudnn-info         # Detailed cuDNN library analysis
env-doctor dockerfile         # Validate Dockerfile for GPU issues
env-doctor docker-compose     # Validate docker-compose.yml for GPU issues
env-doctor model llama-3-8b   # Check if model fits on your GPU
env-doctor model --list       # List all available models
env-doctor install torch      # Get safe install command for PyTorch
env-doctor scan               # Scan project for AI library imports
env-doctor debug              # Show detailed detector information

🧩 Architecture

  • The Brain: compatibility.json maps drivers to max supported CUDA versions and verified wheel URLs.
  • The Detectors: Modular detection system with specialized detectors for:
    • WSL2Detector: Environment detection and GPU forwarding validation
    • NvidiaDriverDetector: GPU driver version and capability detection
    • CudaToolkitDetector: System CUDA installation detection
    • CudnnDetector: cuDNN library detection and configuration validation
    • PythonLibraryDetector: Python AI library version and CUDA compatibility
  • The Registry: DetectorRegistry provides a plugin system for easy detector discovery and execution.
  • The CLI: cli.py orchestrates all detectors and presents unified diagnostics.
  • The Updater: db.py fetches the latest rules from GitHub so you don't need to update the package daily.

🔄 Automated Database Updates

Env-Doctor maintains an up-to-date compatibility database through an automated scraping and validation system, designed for future ease of maintainability while preserving human oversight.

How It Works

  1. Automated Scraping (tools/scraper.py)

    • GitHub Actions workflow runs periodically to scrape official PyTorch/TensorFlow/JAX documentation
    • Extracts latest CUDA compatibility mappings and verified wheel URLs
    • Updates compatibility.json with new versions and URLs
  2. Validation Layer (tools/validator.py) #BETA-Not-Implemented

    • Automatically validates the scraped data structure with cloud GPUs before committing
    • Ensures version strings are parseable and URLs are well-formed
    • Catches malformed entries that could break the tool
  3. Human Verification via PR Merge

    • Automated updates create pull requests (not auto-merged)
    • Maintainers review changes before merging to ensure quality
    • Community members can flag issues or suggest corrections
    • Provides transparency and accountability for database changes

Community-Driven Contributions Preferred

While automation handles routine updates, community contributions are highly valued for:

  • Edge Case Detection: Real-world users catching compatibility issues the scraper misses
  • Platform-Specific Issues: WSL2, conda environments, or unusual driver configurations
  • New Library Support: Adding new AI frameworks or tools
  • Verification: Testing that recommended install commands actually work

This hybrid approach combines automation for maintainability with community oversight for accuracy.

Running the Tools Locally

# Scrape latest compatibility data
python tools/scraper.py

# Validate the database structure
python tools/validator.py

# Both tools are also run automatically by GitHub Actions (.github/workflows/update_db.yml)

🤝 Contributing

We love contributions! Please read our CONTRIBUTING.md for details on how to submit pull requests and our development setup.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

env_doctor-0.1.0.tar.gz (66.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

env_doctor-0.1.0-py3-none-any.whl (67.1 kB view details)

Uploaded Python 3

File details

Details for the file env_doctor-0.1.0.tar.gz.

File metadata

  • Download URL: env_doctor-0.1.0.tar.gz
  • Upload date:
  • Size: 66.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for env_doctor-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d28385908993ec6be138edbe90a4a49c4ecfa59886578ef6740c66fc7fb64c10
MD5 1e01cffb770ca3934b9926495a0442cf
BLAKE2b-256 6f00c9e59d8a23370bbcf006a1d3c02f183dc9b4f5dedf6745525285e5f45cdd

See more details on using hashes here.

File details

Details for the file env_doctor-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: env_doctor-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 67.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for env_doctor-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 57c4c5f8cff01f01fae784d8a15ec77dc9134dbf572a563a53abca47048d0b9b
MD5 58b47a5b7974651b3580517c8509b6fd
BLAKE2b-256 89b25d16f0ffccfc841dfd1fe590a7994c7b294a801f88663e3522eea9f17ad1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page