Smart dependency analysis and minimal requirements generation for MLflow models
Project description
MLflow Dependency Analyzer
Smart dependency analysis and minimal requirements generation for MLflow models.
Automatically detect and generate minimal code_paths and requirements lists for your MLflow models using safe AST-based analysis. Ensure portable and reproducible model deployments without dependency bloat.
๐ Features
- ๐ Unified Analysis: Complete dependency analysis combining requirements and code paths
- ๐ง Smart Detection: Uses Python's
importlibandinspectfor accurate module resolution - ๐ Safe Analysis: AST-based import discovery - no code execution required
- ๐ฆ MLflow Integration: Built-in support for MLflow's production utilities
- ๐ฏ Minimal Dependencies: Intelligent pruning eliminates unnecessary packages
- ๐ Recursive Discovery: Follows deep dependency chains automatically
- ๐ก๏ธ Robust Error Handling: Graceful handling of circular dependencies and import errors
- โก Production Ready: Comprehensive test coverage with real-world scenarios
๐ฆ Installation
pip install mlflow-dep-analyzer
๐ฏ Quick Start
Simple Model Analysis
from mlflow_dep_analyzer import analyze_model_dependencies
# Analyze a single model file
result = analyze_model_dependencies("model.py")
print("๐ฆ External packages needed:")
print(result["requirements"])
print("๐ Local files needed:")
print(result["code_paths"])
MLflow Integration
import mlflow
import mlflow.sklearn
from mlflow_dep_analyzer import analyze_model_dependencies
from sklearn.ensemble import RandomForestClassifier
# Train your model
model = RandomForestClassifier()
# ... training code ...
# Analyze dependencies
deps = analyze_model_dependencies("model.py")
# Log with minimal dependencies
with mlflow.start_run():
mlflow.sklearn.log_model(
model,
"classifier",
code_paths=deps["code_paths"],
pip_requirements=deps["requirements"]
)
๐ API Reference
The MLflow Dependency Analyzer provides a simple, unified interface for dependency analysis:
Main Interface
from mlflow_dep_analyzer import analyze_model_dependencies
# Analyze a single model file
result = analyze_model_dependencies("model.py")
# Analyze with explicit repo root
result = analyze_model_dependencies("model.py", repo_root="/path/to/project")
# Result structure
{
"requirements": ["pandas", "scikit-learn"], # External packages to install
"code_paths": ["model.py", "utils.py"], # Local files to include
"analysis": {
"total_modules": 15,
"external_packages": 2,
"local_files": 2,
"stdlib_modules": 11
}
}
Class-Based Interface
For advanced use cases or multiple analyses:
from mlflow_dep_analyzer import UnifiedDependencyAnalyzer
# Create analyzer instance
analyzer = UnifiedDependencyAnalyzer(repo_root=".")
# Analyze multiple entry points
result = analyzer.analyze_dependencies(["model.py", "train.py", "utils.py"])
Convenience Functions
from mlflow_dep_analyzer import get_model_requirements, get_model_code_paths
# Get just the requirements list
packages = get_model_requirements("model.py")
# Returns: ["pandas", "scikit-learn", "numpy"]
# Get just the code paths list
files = get_model_code_paths("model.py")
# Returns: ["model.py", "utils.py", "preprocessing.py"]
๐๏ธ Architecture
The library uses a single, unified analyzer that provides complete dependency analysis:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ UnifiedDependencyAnalyzer โ
โ (Complete Analysis) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโ AST parsing (safe import discovery)
โโโโ importlib.import_module() (dynamic imports)
โโโโ inspect.getsourcefile() (accurate file paths)
โโโโ Smart classification:
โ โโโโ Standard library โ ignored
โ โโโโ External packages โ requirements
โ โโโโ Local files โ code_paths + recursive analysis
โโโโ MLflow-compatible output
๐ How It Works
- AST Parsing: Safely extracts import statements without executing code
- Module Resolution: Uses
importlib.import_module()+inspect.getsourcefile() - Smart Classification: Automatically categorizes modules:
- ๐ฆ External packages โ Added to requirements
- ๐ Standard library โ Ignored (built into Python)
- ๐ Local files โ Added to code_paths and analyzed recursively
- Dependency Discovery: Recursively follows imports to build complete dependency graph
- Path Optimization: Generates minimal file lists and package requirements
๐ Advanced Usage
Complex Project Structure
from mlflow_dep_analyzer import UnifiedDependencyAnalyzer
# Analyze a complex project with src/ structure
analyzer = UnifiedDependencyAnalyzer(repo_root="/path/to/project")
result = analyzer.analyze_dependencies([
"src/models/classifier.py",
"src/models/preprocessor.py",
"src/utils/data_loader.py"
])
print(f"Found {result['analysis']['total_modules']} total modules")
print(f"External packages: {result['analysis']['external_packages']}")
print(f"Local files: {result['analysis']['local_files']}")
Advanced Analysis
from mlflow_dep_analyzer import UnifiedDependencyAnalyzer
# Get detailed analysis results
analyzer = UnifiedDependencyAnalyzer(repo_root=".")
result = analyzer.analyze_dependencies(["model.py"])
# Access detailed metrics
print(f"Total modules found: {result['analysis']['total_modules']}")
print(f"External packages: {result['analysis']['external_packages']}")
print(f"Local files: {result['analysis']['local_files']}")
print(f"Standard library modules: {result['analysis']['stdlib_modules']}")
Error Handling
from mlflow_dep_analyzer import analyze_model_dependencies
try:
result = analyze_model_dependencies("model.py")
except FileNotFoundError:
print("Model file not found")
except ImportError as e:
print(f"Import resolution failed: {e}")
๐งช Examples
See the examples/ directory for complete working examples:
- Basic Usage: Complete MLflow integration demo
- MLflow Integration: Real-world MLflow projects
- Complex Projects: Multi-file analysis with auto-logging
๐ ๏ธ Development
Setup
This project uses uv for dependency management:
git clone https://github.com/andrewgross/mlflow-dep-analyzer
cd mlflow-dep-analyzer
uv sync
Running Tests
# Run all tests
uv run pytest
# Run with coverage
uv run pytest --cov=src/mlflow_dep_analyzer --cov-report=html
# Run specific test categories
uv run pytest tests/test_unified_analyzer.py -v
Code Quality
# Linting and formatting
uv run ruff check
uv run ruff format
# Type checking
uv run mypy src/
# Pre-commit hooks
uv run pre-commit run --all-files
Requirements
- Python: 3.8+ (developed with 3.11.11 for Databricks Runtime 15.4 LTS compatibility)
- Core dependencies: MLflow 2.0+
- Development: pytest, ruff, mypy, pre-commit
๐ค Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
Quick Contribution Guide
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make your changes with tests
- Run the test suite:
uv run pytest - Submit a pull request
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
- Built on MLflow's production-tested dependency resolution utilities
- Inspired by the need for reliable, minimal MLflow model deployments
- Thanks to the Python AST and importlib developers for robust introspection tools
๐ Roadmap
- Configuration file support
- Plugin system for custom analyzers
- Integration with other ML frameworks
- Dependency vulnerability scanning
- Performance optimizations with caching
Documentation โข Issues โข Contributing
Made with โค๏ธ for the MLflow community
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlflow_dep_analyzer-0.9.0.tar.gz.
File metadata
- Download URL: mlflow_dep_analyzer-0.9.0.tar.gz
- Upload date:
- Size: 36.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1eab1817011996afb978dfd5b483426fb67a57f1510b0075f438f82d3630c6ba
|
|
| MD5 |
7be18045566534cb0e7b050015839068
|
|
| BLAKE2b-256 |
1fbd08ce34fa8a3750c51057f770412aac9a386d4b80f59f08d702fb42f2b4a0
|
File details
Details for the file mlflow_dep_analyzer-0.9.0-py3-none-any.whl.
File metadata
- Download URL: mlflow_dep_analyzer-0.9.0-py3-none-any.whl
- Upload date:
- Size: 13.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2299d7bf91a42747a8a52f384b07a242b2f209de960b139ce3b655a471d05f76
|
|
| MD5 |
4208e85ffa2acca61d7f7bf7296a2266
|
|
| BLAKE2b-256 |
3b71858c443816c1b14258fe9c6ed3d7f01e98451af19cdb0dd353f0a560b35b
|