Skip to main content

Smart dependency analysis and minimal requirements generation for MLflow models

Project description

MLflow Dependency Analyzer

Python 3.8+ License: MIT Tests Coverage

Smart dependency analysis and minimal requirements generation for MLflow models.

Automatically detect and generate minimal code_paths and requirements lists for your MLflow models using safe AST-based analysis. Ensure portable and reproducible model deployments without dependency bloat.

๐Ÿš€ Features

  • ๐Ÿ” Unified Analysis: Complete dependency analysis combining requirements and code paths
  • ๐Ÿง  Smart Detection: Uses Python's importlib and inspect for accurate module resolution
  • ๐Ÿ”’ Safe Analysis: AST-based import discovery - no code execution required
  • ๐Ÿ“ฆ MLflow Integration: Built-in support for MLflow's production utilities
  • ๐ŸŽฏ Minimal Dependencies: Intelligent pruning eliminates unnecessary packages
  • ๐Ÿ”„ Recursive Discovery: Follows deep dependency chains automatically
  • ๐Ÿ›ก๏ธ Robust Error Handling: Graceful handling of circular dependencies and import errors
  • โšก Production Ready: Comprehensive test coverage with real-world scenarios

๐Ÿ“ฆ Installation

pip install mlflow-dep-analyzer

๐ŸŽฏ Quick Start

Simple Model Analysis

from mlflow_dep_analyzer import analyze_model_dependencies

# Analyze a single model file
result = analyze_model_dependencies("model.py")

print("๐Ÿ“ฆ External packages needed:")
print(result["requirements"])

print("๐Ÿ“‚ Local files needed:")
print(result["code_paths"])

MLflow Integration

import mlflow
import mlflow.sklearn
from mlflow_dep_analyzer import analyze_model_dependencies
from sklearn.ensemble import RandomForestClassifier

# Train your model
model = RandomForestClassifier()
# ... training code ...

# Analyze dependencies
deps = analyze_model_dependencies("model.py")

# Log with minimal dependencies
with mlflow.start_run():
    mlflow.sklearn.log_model(
        model,
        "classifier",
        code_paths=deps["code_paths"],
        pip_requirements=deps["requirements"]
    )

๐Ÿ“š API Reference

The MLflow Dependency Analyzer provides a simple, unified interface for dependency analysis:

Main Interface

from mlflow_dep_analyzer import analyze_model_dependencies

# Analyze a single model file
result = analyze_model_dependencies("model.py")

# Analyze with explicit repo root
result = analyze_model_dependencies("model.py", repo_root="/path/to/project")

# Result structure
{
    "requirements": ["pandas", "scikit-learn"],  # External packages to install
    "code_paths": ["model.py", "utils.py"],      # Local files to include
    "analysis": {
        "total_modules": 15,
        "external_packages": 2,
        "local_files": 2,
        "stdlib_modules": 11
    }
}

Class-Based Interface

For advanced use cases or multiple analyses:

from mlflow_dep_analyzer import UnifiedDependencyAnalyzer

# Create analyzer instance
analyzer = UnifiedDependencyAnalyzer(repo_root=".")

# Analyze multiple entry points
result = analyzer.analyze_dependencies(["model.py", "train.py", "utils.py"])

Convenience Functions

from mlflow_dep_analyzer import get_model_requirements, get_model_code_paths

# Get just the requirements list
packages = get_model_requirements("model.py")
# Returns: ["pandas", "scikit-learn", "numpy"]

# Get just the code paths list
files = get_model_code_paths("model.py")
# Returns: ["model.py", "utils.py", "preprocessing.py"]

๐Ÿ—๏ธ Architecture

The library uses a single, unified analyzer that provides complete dependency analysis:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   UnifiedDependencyAnalyzer โ”‚
โ”‚    (Complete Analysis)      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                โ”‚
                โ”œโ”€โ”€โ”€ AST parsing (safe import discovery)
                โ”œโ”€โ”€โ”€ importlib.import_module() (dynamic imports)
                โ”œโ”€โ”€โ”€ inspect.getsourcefile() (accurate file paths)
                โ”œโ”€โ”€โ”€ Smart classification:
                โ”‚    โ”œโ”€โ”€โ”€ Standard library โ†’ ignored
                โ”‚    โ”œโ”€โ”€โ”€ External packages โ†’ requirements
                โ”‚    โ””โ”€โ”€โ”€ Local files โ†’ code_paths + recursive analysis
                โ””โ”€โ”€โ”€ MLflow-compatible output

๐Ÿ” How It Works

  1. AST Parsing: Safely extracts import statements without executing code
  2. Module Resolution: Uses importlib.import_module() + inspect.getsourcefile()
  3. Smart Classification: Automatically categorizes modules:
    • ๐Ÿ“ฆ External packages โ†’ Added to requirements
    • ๐Ÿ Standard library โ†’ Ignored (built into Python)
    • ๐Ÿ“ Local files โ†’ Added to code_paths and analyzed recursively
  4. Dependency Discovery: Recursively follows imports to build complete dependency graph
  5. Path Optimization: Generates minimal file lists and package requirements

๐ŸŒŸ Advanced Usage

Complex Project Structure

from mlflow_dep_analyzer import UnifiedDependencyAnalyzer

# Analyze a complex project with src/ structure
analyzer = UnifiedDependencyAnalyzer(repo_root="/path/to/project")
result = analyzer.analyze_dependencies([
    "src/models/classifier.py",
    "src/models/preprocessor.py",
    "src/utils/data_loader.py"
])

print(f"Found {result['analysis']['total_modules']} total modules")
print(f"External packages: {result['analysis']['external_packages']}")
print(f"Local files: {result['analysis']['local_files']}")

Advanced Analysis

from mlflow_dep_analyzer import UnifiedDependencyAnalyzer

# Get detailed analysis results
analyzer = UnifiedDependencyAnalyzer(repo_root=".")
result = analyzer.analyze_dependencies(["model.py"])

# Access detailed metrics
print(f"Total modules found: {result['analysis']['total_modules']}")
print(f"External packages: {result['analysis']['external_packages']}")
print(f"Local files: {result['analysis']['local_files']}")
print(f"Standard library modules: {result['analysis']['stdlib_modules']}")

Error Handling

from mlflow_dep_analyzer import analyze_model_dependencies

try:
    result = analyze_model_dependencies("model.py")
except FileNotFoundError:
    print("Model file not found")
except ImportError as e:
    print(f"Import resolution failed: {e}")

๐Ÿงช Examples

See the examples/ directory for complete working examples:

๐Ÿ› ๏ธ Development

Setup

This project uses uv for dependency management:

git clone https://github.com/andrewgross/mlflow-dep-analyzer
cd mlflow-dep-analyzer
uv sync

Running Tests

# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=src/mlflow_dep_analyzer --cov-report=html

# Run specific test categories
uv run pytest tests/test_unified_analyzer.py -v

Code Quality

# Linting and formatting
uv run ruff check
uv run ruff format

# Type checking
uv run mypy src/

# Pre-commit hooks
uv run pre-commit run --all-files

Requirements

  • Python: 3.8+ (developed with 3.11.11 for Databricks Runtime 15.4 LTS compatibility)
  • Core dependencies: MLflow 2.0+
  • Development: pytest, ruff, mypy, pre-commit

๐Ÿค Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Quick Contribution Guide

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature-name
  3. Make your changes with tests
  4. Run the test suite: uv run pytest
  5. Submit a pull request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • Built on MLflow's production-tested dependency resolution utilities
  • Inspired by the need for reliable, minimal MLflow model deployments
  • Thanks to the Python AST and importlib developers for robust introspection tools

๐Ÿ“ˆ Roadmap

  • Configuration file support
  • Plugin system for custom analyzers
  • Integration with other ML frameworks
  • Dependency vulnerability scanning
  • Performance optimizations with caching

Documentation โ€ข Issues โ€ข Contributing

Made with โค๏ธ for the MLflow community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlflow_dep_analyzer-0.9.0.tar.gz (36.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlflow_dep_analyzer-0.9.0-py3-none-any.whl (13.2 kB view details)

Uploaded Python 3

File details

Details for the file mlflow_dep_analyzer-0.9.0.tar.gz.

File metadata

  • Download URL: mlflow_dep_analyzer-0.9.0.tar.gz
  • Upload date:
  • Size: 36.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.20

File hashes

Hashes for mlflow_dep_analyzer-0.9.0.tar.gz
Algorithm Hash digest
SHA256 1eab1817011996afb978dfd5b483426fb67a57f1510b0075f438f82d3630c6ba
MD5 7be18045566534cb0e7b050015839068
BLAKE2b-256 1fbd08ce34fa8a3750c51057f770412aac9a386d4b80f59f08d702fb42f2b4a0

See more details on using hashes here.

File details

Details for the file mlflow_dep_analyzer-0.9.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mlflow_dep_analyzer-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2299d7bf91a42747a8a52f384b07a242b2f209de960b139ce3b655a471d05f76
MD5 4208e85ffa2acca61d7f7bf7296a2266
BLAKE2b-256 3b71858c443816c1b14258fe9c6ed3d7f01e98451af19cdb0dd353f0a560b35b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page