Skip to main content

Smart dependency analysis and minimal requirements generation for MLflow models

Project description

MLflow Dependency Analyzer

Python 3.8+ License: MIT Tests Coverage

Smart dependency analysis and minimal requirements generation for MLflow models.

Automatically detect and generate minimal code_paths and requirements lists for your MLflow models using safe AST-based analysis. Ensure portable and reproducible model deployments without dependency bloat.

๐Ÿš€ Features

  • ๐Ÿ” Unified Analysis: Complete dependency analysis combining requirements and code paths
  • ๐Ÿง  Smart Detection: Uses Python's importlib and inspect for accurate module resolution
  • ๐Ÿ”’ Safe Analysis: AST-based import discovery - no code execution required
  • ๐Ÿ“ฆ MLflow Integration: Built-in support for MLflow's production utilities
  • ๐ŸŽฏ Minimal Dependencies: Intelligent pruning eliminates unnecessary packages
  • ๐Ÿ”„ Recursive Discovery: Follows deep dependency chains automatically
  • ๐Ÿ›ก๏ธ Robust Error Handling: Graceful handling of circular dependencies and import errors
  • โšก Production Ready: Comprehensive test coverage with real-world scenarios

๐Ÿ“ฆ Installation

pip install mlflow-dep-analyzer

๐ŸŽฏ Quick Start

Simple Model Analysis

from mlflow_dep_analyzer import analyze_model_dependencies

# Analyze a single model file
result = analyze_model_dependencies("model.py")

print("๐Ÿ“ฆ External packages needed:")
print(result["requirements"])

print("๐Ÿ“‚ Local files needed:")
print(result["code_paths"])

MLflow Integration

import mlflow
import mlflow.sklearn
from mlflow_dep_analyzer import analyze_model_dependencies
from sklearn.ensemble import RandomForestClassifier

# Train your model
model = RandomForestClassifier()
# ... training code ...

# Analyze dependencies
deps = analyze_model_dependencies("model.py")

# Log with minimal dependencies
with mlflow.start_run():
    mlflow.sklearn.log_model(
        model,
        "classifier",
        code_paths=deps["code_paths"],
        pip_requirements=deps["requirements"]
    )

๐Ÿ“š API Reference

The MLflow Dependency Analyzer provides a simple, unified interface for dependency analysis:

Main Interface

from mlflow_dep_analyzer import analyze_model_dependencies

# Analyze a single model file
result = analyze_model_dependencies("model.py")

# Analyze with explicit repo root
result = analyze_model_dependencies("model.py", repo_root="/path/to/project")

# Result structure
{
    "requirements": ["pandas", "scikit-learn"],  # External packages to install
    "code_paths": ["model.py", "utils.py"],      # Local files to include
    "analysis": {
        "total_modules": 15,
        "external_packages": 2,
        "local_files": 2,
        "stdlib_modules": 11
    }
}

Class-Based Interface

For advanced use cases or multiple analyses:

from mlflow_dep_analyzer import UnifiedDependencyAnalyzer

# Create analyzer instance
analyzer = UnifiedDependencyAnalyzer(repo_root=".")

# Analyze multiple entry points
result = analyzer.analyze_dependencies(["model.py", "train.py", "utils.py"])

Convenience Functions

from mlflow_dep_analyzer import get_model_requirements, get_model_code_paths

# Get just the requirements list
packages = get_model_requirements("model.py")
# Returns: ["pandas", "scikit-learn", "numpy"]

# Get just the code paths list
files = get_model_code_paths("model.py")
# Returns: ["model.py", "utils.py", "preprocessing.py"]

๐Ÿ—๏ธ Architecture

The library uses a single, unified analyzer that provides complete dependency analysis:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   UnifiedDependencyAnalyzer โ”‚
โ”‚    (Complete Analysis)      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                โ”‚
                โ”œโ”€โ”€โ”€ AST parsing (safe import discovery)
                โ”œโ”€โ”€โ”€ importlib.import_module() (dynamic imports)
                โ”œโ”€โ”€โ”€ inspect.getsourcefile() (accurate file paths)
                โ”œโ”€โ”€โ”€ Smart classification:
                โ”‚    โ”œโ”€โ”€โ”€ Standard library โ†’ ignored
                โ”‚    โ”œโ”€โ”€โ”€ External packages โ†’ requirements
                โ”‚    โ””โ”€โ”€โ”€ Local files โ†’ code_paths + recursive analysis
                โ””โ”€โ”€โ”€ MLflow-compatible output

๐Ÿ” How It Works

  1. AST Parsing: Safely extracts import statements without executing code
  2. Module Resolution: Uses importlib.import_module() + inspect.getsourcefile()
  3. Smart Classification: Automatically categorizes modules:
    • ๐Ÿ“ฆ External packages โ†’ Added to requirements
    • ๐Ÿ Standard library โ†’ Ignored (built into Python)
    • ๐Ÿ“ Local files โ†’ Added to code_paths and analyzed recursively
  4. Dependency Discovery: Recursively follows imports to build complete dependency graph
  5. Path Optimization: Generates minimal file lists and package requirements

๐ŸŒŸ Advanced Usage

Complex Project Structure

from mlflow_dep_analyzer import UnifiedDependencyAnalyzer

# Analyze a complex project with src/ structure
analyzer = UnifiedDependencyAnalyzer(repo_root="/path/to/project")
result = analyzer.analyze_dependencies([
    "src/models/classifier.py",
    "src/models/preprocessor.py",
    "src/utils/data_loader.py"
])

print(f"Found {result['analysis']['total_modules']} total modules")
print(f"External packages: {result['analysis']['external_packages']}")
print(f"Local files: {result['analysis']['local_files']}")

Advanced Analysis

from mlflow_dep_analyzer import UnifiedDependencyAnalyzer

# Get detailed analysis results
analyzer = UnifiedDependencyAnalyzer(repo_root=".")
result = analyzer.analyze_dependencies(["model.py"])

# Access detailed metrics
print(f"Total modules found: {result['analysis']['total_modules']}")
print(f"External packages: {result['analysis']['external_packages']}")
print(f"Local files: {result['analysis']['local_files']}")
print(f"Standard library modules: {result['analysis']['stdlib_modules']}")

Error Handling

from mlflow_dep_analyzer import analyze_model_dependencies

try:
    result = analyze_model_dependencies("model.py")
except FileNotFoundError:
    print("Model file not found")
except ImportError as e:
    print(f"Import resolution failed: {e}")

๐Ÿงช Examples

See the examples/ directory for complete working examples:

๐Ÿ› ๏ธ Development

Setup

This project uses uv for dependency management:

git clone https://github.com/andrewgross/mlflow-dep-analyzer
cd mlflow-dep-analyzer
uv sync

Running Tests

# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=src/mlflow_dep_analyzer --cov-report=html

# Run specific test categories
uv run pytest tests/test_unified_analyzer.py -v

Code Quality

# Linting and formatting
uv run ruff check
uv run ruff format

# Type checking
uv run mypy src/

# Pre-commit hooks
uv run pre-commit run --all-files

Requirements

  • Python: 3.8+ (developed with 3.11.11 for Databricks Runtime 15.4 LTS compatibility)
  • Core dependencies: MLflow 2.0+
  • Development: pytest, ruff, mypy, pre-commit

๐Ÿค Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Quick Contribution Guide

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature-name
  3. Make your changes with tests
  4. Run the test suite: uv run pytest
  5. Submit a pull request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • Built on MLflow's production-tested dependency resolution utilities
  • Inspired by the need for reliable, minimal MLflow model deployments
  • Thanks to the Python AST and importlib developers for robust introspection tools

๐Ÿ“ˆ Roadmap

  • Configuration file support
  • Plugin system for custom analyzers
  • Integration with other ML frameworks
  • Dependency vulnerability scanning
  • Performance optimizations with caching

Documentation โ€ข Issues โ€ข Contributing

Made with โค๏ธ for the MLflow community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlflow_dep_analyzer-0.4.0.tar.gz (33.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlflow_dep_analyzer-0.4.0-py3-none-any.whl (11.4 kB view details)

Uploaded Python 3

File details

Details for the file mlflow_dep_analyzer-0.4.0.tar.gz.

File metadata

  • Download URL: mlflow_dep_analyzer-0.4.0.tar.gz
  • Upload date:
  • Size: 33.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.20

File hashes

Hashes for mlflow_dep_analyzer-0.4.0.tar.gz
Algorithm Hash digest
SHA256 9a8a396a9229427f7c7f96db1d2238b97453dd7a1046d2ff449228c2cd449eb4
MD5 cbeb944c04b2c584db6bf2e9d3f001cf
BLAKE2b-256 f9c055865348eda57c67bb3dfe6f5e9300c9872c80b2e245390a8c684eaa02d0

See more details on using hashes here.

File details

Details for the file mlflow_dep_analyzer-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mlflow_dep_analyzer-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 56cdf6308b866a77fc5cadfe16ef99b0aaddfc50896036f20dd6ab1ccb1630f3
MD5 439208cc6ff07c9ca820397243026a72
BLAKE2b-256 367cbe27be89478871eefc56d870222b802f12820148629d90d964674e0a557c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page