Skip to main content

Smart dependency analysis and minimal requirements generation for MLflow models

Project description

MLflow Dependency Analyzer

Python 3.8+ License: MIT Tests Coverage

Smart dependency analysis and minimal requirements generation for MLflow models.

Automatically detect and generate minimal code_paths and requirements lists for your MLflow models using safe AST-based analysis. Ensure portable and reproducible model deployments without dependency bloat.

๐Ÿš€ Features

  • ๐Ÿ” Unified Analysis: Complete dependency analysis combining requirements and code paths
  • ๐Ÿง  Smart Detection: Uses Python's importlib and inspect for accurate module resolution
  • ๐Ÿ”’ Safe Analysis: AST-based import discovery - no code execution required
  • ๐Ÿ“ฆ MLflow Integration: Built-in support for MLflow's production utilities
  • ๐ŸŽฏ Minimal Dependencies: Intelligent pruning eliminates unnecessary packages
  • ๐Ÿ”„ Recursive Discovery: Follows deep dependency chains automatically
  • ๐Ÿ›ก๏ธ Robust Error Handling: Graceful handling of circular dependencies and import errors
  • โšก Production Ready: Comprehensive test coverage with real-world scenarios

๐Ÿ“ฆ Installation

pip install mlflow-dep-analyzer

๐ŸŽฏ Quick Start

Simple Model Analysis

from mlflow_dep_analyzer import analyze_model_dependencies

# Analyze a single model file
result = analyze_model_dependencies("model.py")

print("๐Ÿ“ฆ External packages needed:")
print(result["requirements"])

print("๐Ÿ“‚ Local files needed:")
print(result["code_paths"])

MLflow Integration

import mlflow
import mlflow.sklearn
from mlflow_dep_analyzer import analyze_model_dependencies
from sklearn.ensemble import RandomForestClassifier

# Train your model
model = RandomForestClassifier()
# ... training code ...

# Analyze dependencies
deps = analyze_model_dependencies("model.py")

# Log with minimal dependencies
with mlflow.start_run():
    mlflow.sklearn.log_model(
        model,
        "classifier",
        code_paths=deps["code_paths"],
        pip_requirements=deps["requirements"]
    )

๐Ÿ“š API Reference

The MLflow Dependency Analyzer provides a simple, unified interface for dependency analysis:

Main Interface

from mlflow_dep_analyzer import analyze_model_dependencies

# Analyze a single model file
result = analyze_model_dependencies("model.py")

# Analyze with explicit repo root
result = analyze_model_dependencies("model.py", repo_root="/path/to/project")

# Result structure
{
    "requirements": ["pandas", "scikit-learn"],  # External packages to install
    "code_paths": ["model.py", "utils.py"],      # Local files to include
    "analysis": {
        "total_modules": 15,
        "external_packages": 2,
        "local_files": 2,
        "stdlib_modules": 11
    }
}

Class-Based Interface

For advanced use cases or multiple analyses:

from mlflow_dep_analyzer import UnifiedDependencyAnalyzer

# Create analyzer instance
analyzer = UnifiedDependencyAnalyzer(repo_root=".")

# Analyze multiple entry points
result = analyzer.analyze_dependencies(["model.py", "train.py", "utils.py"])

Convenience Functions

from mlflow_dep_analyzer import get_model_requirements, get_model_code_paths

# Get just the requirements list
packages = get_model_requirements("model.py")
# Returns: ["pandas", "scikit-learn", "numpy"]

# Get just the code paths list
files = get_model_code_paths("model.py")
# Returns: ["model.py", "utils.py", "preprocessing.py"]

๐Ÿ—๏ธ Architecture

The library uses a single, unified analyzer that provides complete dependency analysis:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   UnifiedDependencyAnalyzer โ”‚
โ”‚    (Complete Analysis)      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                โ”‚
                โ”œโ”€โ”€โ”€ AST parsing (safe import discovery)
                โ”œโ”€โ”€โ”€ importlib.import_module() (dynamic imports)
                โ”œโ”€โ”€โ”€ inspect.getsourcefile() (accurate file paths)
                โ”œโ”€โ”€โ”€ Smart classification:
                โ”‚    โ”œโ”€โ”€โ”€ Standard library โ†’ ignored
                โ”‚    โ”œโ”€โ”€โ”€ External packages โ†’ requirements
                โ”‚    โ””โ”€โ”€โ”€ Local files โ†’ code_paths + recursive analysis
                โ””โ”€โ”€โ”€ MLflow-compatible output

๐Ÿ” How It Works

  1. AST Parsing: Safely extracts import statements without executing code
  2. Module Resolution: Uses importlib.import_module() + inspect.getsourcefile()
  3. Smart Classification: Automatically categorizes modules:
    • ๐Ÿ“ฆ External packages โ†’ Added to requirements
    • ๐Ÿ Standard library โ†’ Ignored (built into Python)
    • ๐Ÿ“ Local files โ†’ Added to code_paths and analyzed recursively
  4. Dependency Discovery: Recursively follows imports to build complete dependency graph
  5. Path Optimization: Generates minimal file lists and package requirements

๐ŸŒŸ Advanced Usage

Complex Project Structure

from mlflow_dep_analyzer import UnifiedDependencyAnalyzer

# Analyze a complex project with src/ structure
analyzer = UnifiedDependencyAnalyzer(repo_root="/path/to/project")
result = analyzer.analyze_dependencies([
    "src/models/classifier.py",
    "src/models/preprocessor.py",
    "src/utils/data_loader.py"
])

print(f"Found {result['analysis']['total_modules']} total modules")
print(f"External packages: {result['analysis']['external_packages']}")
print(f"Local files: {result['analysis']['local_files']}")

Advanced Analysis

from mlflow_dep_analyzer import UnifiedDependencyAnalyzer

# Get detailed analysis results
analyzer = UnifiedDependencyAnalyzer(repo_root=".")
result = analyzer.analyze_dependencies(["model.py"])

# Access detailed metrics
print(f"Total modules found: {result['analysis']['total_modules']}")
print(f"External packages: {result['analysis']['external_packages']}")
print(f"Local files: {result['analysis']['local_files']}")
print(f"Standard library modules: {result['analysis']['stdlib_modules']}")

Error Handling

from mlflow_dep_analyzer import analyze_model_dependencies

try:
    result = analyze_model_dependencies("model.py")
except FileNotFoundError:
    print("Model file not found")
except ImportError as e:
    print(f"Import resolution failed: {e}")

๐Ÿงช Examples

See the examples/ directory for complete working examples:

๐Ÿ› ๏ธ Development

Setup

This project uses uv for dependency management:

git clone https://github.com/andrewgross/mlflow-dep-analyzer
cd mlflow-dep-analyzer
uv sync

Running Tests

# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=src/mlflow_dep_analyzer --cov-report=html

# Run specific test categories
uv run pytest tests/test_unified_analyzer.py -v

Code Quality

# Linting and formatting
uv run ruff check
uv run ruff format

# Type checking
uv run mypy src/

# Pre-commit hooks
uv run pre-commit run --all-files

Requirements

  • Python: 3.8+ (developed with 3.11.11 for Databricks Runtime 15.4 LTS compatibility)
  • Core dependencies: MLflow 2.0+
  • Development: pytest, ruff, mypy, pre-commit

๐Ÿค Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Quick Contribution Guide

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature-name
  3. Make your changes with tests
  4. Run the test suite: uv run pytest
  5. Submit a pull request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • Built on MLflow's production-tested dependency resolution utilities
  • Inspired by the need for reliable, minimal MLflow model deployments
  • Thanks to the Python AST and importlib developers for robust introspection tools

๐Ÿ“ˆ Roadmap

  • Configuration file support
  • Plugin system for custom analyzers
  • Integration with other ML frameworks
  • Dependency vulnerability scanning
  • Performance optimizations with caching

Documentation โ€ข Issues โ€ข Contributing

Made with โค๏ธ for the MLflow community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlflow_dep_analyzer-0.6.0.tar.gz (36.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlflow_dep_analyzer-0.6.0-py3-none-any.whl (13.2 kB view details)

Uploaded Python 3

File details

Details for the file mlflow_dep_analyzer-0.6.0.tar.gz.

File metadata

  • Download URL: mlflow_dep_analyzer-0.6.0.tar.gz
  • Upload date:
  • Size: 36.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.20

File hashes

Hashes for mlflow_dep_analyzer-0.6.0.tar.gz
Algorithm Hash digest
SHA256 ace816ed8a90a46ee55ab6e1b6456dd79629b7005f443cbf460cc87f35ea80ff
MD5 418c026584a8ac8739a64397eec0c1aa
BLAKE2b-256 43d3435b4e7308334d4347bb2fa1072d2ecfab1a6b46b00e0444f45f4c1d4c46

See more details on using hashes here.

File details

Details for the file mlflow_dep_analyzer-0.6.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mlflow_dep_analyzer-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 456a04e0daf62becc5c7371cf197898c08239d427497fd2a0920bc0975fa443c
MD5 484a1f1df1ab435f7b8278d3895793ca
BLAKE2b-256 d0accc715b76d7e8cde415cc9c3765ec6387476c8a33bc6154cb8683a988c3bb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page