Smart dependency analysis and minimal requirements generation for MLflow models

Project description

MLflow Dependency Analyzer

Smart dependency analysis and minimal requirements generation for MLflow models.

Automatically detect and generate minimal code_paths and requirements lists for your MLflow models using safe AST-based analysis. Ensure portable and reproducible model deployments without dependency bloat.

🚀 Features

🔍 Unified Analysis: Complete dependency analysis combining requirements and code paths
🧠 Smart Detection: Uses Python's importlib and inspect for accurate module resolution
🔒 Safe Analysis: AST-based import discovery - no code execution required
📦 MLflow Integration: Built-in support for MLflow's production utilities
🎯 Minimal Dependencies: Intelligent pruning eliminates unnecessary packages
🔄 Recursive Discovery: Follows deep dependency chains automatically
🛡️ Robust Error Handling: Graceful handling of circular dependencies and import errors
⚡ Production Ready: Comprehensive test coverage with real-world scenarios

📦 Installation

pip install mlflow-dep-analyzer

🎯 Quick Start

Simple Model Analysis

from mlflow_dep_analyzer import analyze_model_dependencies

# Analyze a single model file
result = analyze_model_dependencies("model.py")

print("📦 External packages needed:")
print(result["requirements"])

print("📂 Local files needed:")
print(result["code_paths"])

MLflow Integration

import mlflow
import mlflow.sklearn
from mlflow_dep_analyzer import analyze_model_dependencies
from sklearn.ensemble import RandomForestClassifier

# Train your model
model = RandomForestClassifier()
# ... training code ...

# Analyze dependencies
deps = analyze_model_dependencies("model.py")

# Log with minimal dependencies
with mlflow.start_run():
    mlflow.sklearn.log_model(
        model,
        "classifier",
        code_paths=deps["code_paths"],
        pip_requirements=deps["requirements"]
    )

📚 API Reference

The MLflow Dependency Analyzer provides a simple, unified interface for dependency analysis:

Main Interface

from mlflow_dep_analyzer import analyze_model_dependencies

# Analyze a single model file
result = analyze_model_dependencies("model.py")

# Analyze with explicit repo root
result = analyze_model_dependencies("model.py", repo_root="/path/to/project")

# Result structure
{
    "requirements": ["pandas", "scikit-learn"],  # External packages to install
    "code_paths": ["model.py", "utils.py"],      # Local files to include
    "analysis": {
        "total_modules": 15,
        "external_packages": 2,
        "local_files": 2,
        "stdlib_modules": 11
    }
}

Class-Based Interface

For advanced use cases or multiple analyses:

from mlflow_dep_analyzer import UnifiedDependencyAnalyzer

# Create analyzer instance
analyzer = UnifiedDependencyAnalyzer(repo_root=".")

# Analyze multiple entry points
result = analyzer.analyze_dependencies(["model.py", "train.py", "utils.py"])

Convenience Functions

from mlflow_dep_analyzer import get_model_requirements, get_model_code_paths

# Get just the requirements list
packages = get_model_requirements("model.py")
# Returns: ["pandas", "scikit-learn", "numpy"]

# Get just the code paths list
files = get_model_code_paths("model.py")
# Returns: ["model.py", "utils.py", "preprocessing.py"]

🏗️ Architecture

The library uses a single, unified analyzer that provides complete dependency analysis:

┌─────────────────────────────┐
│   UnifiedDependencyAnalyzer │
│    (Complete Analysis)      │
└─────────────────────────────┘
                │
                ├─── AST parsing (safe import discovery)
                ├─── importlib.import_module() (dynamic imports)
                ├─── inspect.getsourcefile() (accurate file paths)
                ├─── Smart classification:
                │    ├─── Standard library → ignored
                │    ├─── External packages → requirements
                │    └─── Local files → code_paths + recursive analysis
                └─── MLflow-compatible output

🔍 How It Works

AST Parsing: Safely extracts import statements without executing code
Module Resolution: Uses importlib.import_module() + inspect.getsourcefile()
Smart Classification: Automatically categorizes modules:
- 📦 External packages → Added to requirements
- 🐍 Standard library → Ignored (built into Python)
- 📁 Local files → Added to code_paths and analyzed recursively
Dependency Discovery: Recursively follows imports to build complete dependency graph
Path Optimization: Generates minimal file lists and package requirements

🌟 Advanced Usage

Complex Project Structure

from mlflow_dep_analyzer import UnifiedDependencyAnalyzer

# Analyze a complex project with src/ structure
analyzer = UnifiedDependencyAnalyzer(repo_root="/path/to/project")
result = analyzer.analyze_dependencies([
    "src/models/classifier.py",
    "src/models/preprocessor.py",
    "src/utils/data_loader.py"
])

print(f"Found {result['analysis']['total_modules']} total modules")
print(f"External packages: {result['analysis']['external_packages']}")
print(f"Local files: {result['analysis']['local_files']}")

Advanced Analysis

from mlflow_dep_analyzer import UnifiedDependencyAnalyzer

# Get detailed analysis results
analyzer = UnifiedDependencyAnalyzer(repo_root=".")
result = analyzer.analyze_dependencies(["model.py"])

# Access detailed metrics
print(f"Total modules found: {result['analysis']['total_modules']}")
print(f"External packages: {result['analysis']['external_packages']}")
print(f"Local files: {result['analysis']['local_files']}")
print(f"Standard library modules: {result['analysis']['stdlib_modules']}")

Error Handling

from mlflow_dep_analyzer import analyze_model_dependencies

try:
    result = analyze_model_dependencies("model.py")
except FileNotFoundError:
    print("Model file not found")
except ImportError as e:
    print(f"Import resolution failed: {e}")

🧪 Examples

See the examples/ directory for complete working examples:

Basic Usage: Complete MLflow integration demo
MLflow Integration: Real-world MLflow projects
Complex Projects: Multi-file analysis with auto-logging

🛠️ Development

Setup

This project uses uv for dependency management:

git clone https://github.com/andrewgross/mlflow-dep-analyzer
cd mlflow-dep-analyzer
uv sync

Running Tests

# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=src/mlflow_dep_analyzer --cov-report=html

# Run specific test categories
uv run pytest tests/test_unified_analyzer.py -v

Code Quality

# Linting and formatting
uv run ruff check
uv run ruff format

# Type checking
uv run mypy src/

# Pre-commit hooks
uv run pre-commit run --all-files

Requirements

Python: 3.8+ (developed with 3.11.11 for Databricks Runtime 15.4 LTS compatibility)
Core dependencies: MLflow 2.0+
Development: pytest, ruff, mypy, pre-commit

🤝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Quick Contribution Guide

Fork the repository
Create a feature branch: git checkout -b feature-name
Make your changes with tests
Run the test suite: uv run pytest
Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Built on MLflow's production-tested dependency resolution utilities
Inspired by the need for reliable, minimal MLflow model deployments
Thanks to the Python AST and importlib developers for robust introspection tools

📈 Roadmap

Configuration file support
Plugin system for custom analyzers
Integration with other ML frameworks
Dependency vulnerability scanning
Performance optimizations with caching

Documentation • Issues • Contributing

Made with ❤️ for the MLflow community

Project details

Release history Release notifications | RSS feed

0.9.0

Aug 19, 2025

0.8.0

Jul 16, 2025

0.7.0

Jul 15, 2025

0.6.0

Jul 13, 2025

This version

0.4.0

Jul 13, 2025

0.3.0

Jul 13, 2025

0.2.0

Jul 12, 2025

0.1.0

Jul 11, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlflow_dep_analyzer-0.4.0.tar.gz (33.1 kB view details)

Uploaded Jul 13, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mlflow_dep_analyzer-0.4.0-py3-none-any.whl (11.4 kB view details)

Uploaded Jul 13, 2025 Python 3

File details

Details for the file mlflow_dep_analyzer-0.4.0.tar.gz.

File metadata

Download URL: mlflow_dep_analyzer-0.4.0.tar.gz
Upload date: Jul 13, 2025
Size: 33.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.20

File hashes

Hashes for mlflow_dep_analyzer-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`9a8a396a9229427f7c7f96db1d2238b97453dd7a1046d2ff449228c2cd449eb4`
MD5	`cbeb944c04b2c584db6bf2e9d3f001cf`
BLAKE2b-256	`f9c055865348eda57c67bb3dfe6f5e9300c9872c80b2e245390a8c684eaa02d0`

See more details on using hashes here.

File details

Details for the file mlflow_dep_analyzer-0.4.0-py3-none-any.whl.

File metadata

Download URL: mlflow_dep_analyzer-0.4.0-py3-none-any.whl
Upload date: Jul 13, 2025
Size: 11.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.20

File hashes

Hashes for mlflow_dep_analyzer-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`56cdf6308b866a77fc5cadfe16ef99b0aaddfc50896036f20dd6ab1ccb1630f3`
MD5	`439208cc6ff07c9ca820397243026a72`
BLAKE2b-256	`367cbe27be89478871eefc56d870222b802f12820148629d90d964674e0a557c`

See more details on using hashes here.

mlflow-dep-analyzer 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

MLflow Dependency Analyzer

🚀 Features

📦 Installation

🎯 Quick Start

Simple Model Analysis

MLflow Integration

📚 API Reference

Main Interface

Class-Based Interface

Convenience Functions

🏗️ Architecture

🔍 How It Works

🌟 Advanced Usage

Complex Project Structure

Advanced Analysis

Error Handling

🧪 Examples

🛠️ Development

Setup

Running Tests

Code Quality

Requirements

🤝 Contributing

Quick Contribution Guide

📄 License

🙏 Acknowledgments

📈 Roadmap

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes