Skip to main content

Genesis-Flow: MLflow v3.1.4 compatible fork for Genesis platform

Project description

Genesis-Flow

Genesis-Flow is a secure, lightweight, and scalable ML operations platform built as a fork of MLflow. It provides enterprise-grade security features, PostgreSQL with Azure Managed Identity support, Google Cloud Storage integration, and a comprehensive plugin architecture while maintaining 100% API compatibility with standard MLflow.

๐Ÿš€ Key Features

Security-First Design

  • Input validation against SQL injection and path traversal attacks
  • Authentication and authorization ready for enterprise deployment
  • Security patches for all known vulnerabilities in dependencies

Scalable Architecture

  • PostgreSQL with Azure Managed Identity for secure, passwordless database access
  • Azure Blob Storage & Google Cloud Storage support for artifact storage
  • Hybrid storage architecture for optimal performance
  • Multi-tenancy support with proper data isolation

Plugin System

  • Modular framework integrations (PyTorch, TensorFlow, Scikit-learn, etc.)
  • Lazy loading for optimal performance and reduced memory footprint
  • Custom plugin development support
  • Framework auto-detection and lifecycle management

Enterprise Ready

  • 100% MLflow API compatibility for seamless migration
  • Comprehensive testing suite with performance validation
  • Migration tools from standard MLflow deployments
  • Production deployment guides and best practices

๐Ÿ“ฆ Installation

Prerequisites

  • Python 3.8+
  • PostgreSQL 11+ (optional, for SQL backend)
  • Azure Storage Account or Google Cloud Storage bucket (optional, for cloud artifacts)

Quick Install

# Clone the repository
git clone https://github.com/your-org/genesis-flow.git
cd genesis-flow

# Install with Poetry
poetry install

# Or install with pip
pip install -e .

Install with Framework Support

# Install with PyTorch support
poetry install --extras pytorch

# Install with all ML frameworks
poetry install --extras "pytorch transformers"

# Install for development
poetry install --with dev

๐ŸŽฏ Quick Start

Basic Usage

import mlflow

# Set tracking URI (supports file, PostgreSQL, etc.)
mlflow.set_tracking_uri("file:///path/to/mlruns")

# Create experiment
experiment_id = mlflow.create_experiment("my_experiment")

# Start a run
with mlflow.start_run(experiment_id=experiment_id):
    # Log parameters
    mlflow.log_param("learning_rate", 0.01)
    mlflow.log_param("epochs", 100)
    
    # Log metrics
    mlflow.log_metric("accuracy", 0.95)
    mlflow.log_metric("loss", 0.05)
    
    # Log artifacts
    mlflow.log_artifact("model.pkl")

PostgreSQL with Managed Identity

import mlflow
import os

# Configure PostgreSQL with Azure Managed Identity (no password needed)
mlflow.set_tracking_uri("postgresql://user@server.postgres.database.azure.com:5432/mlflow?auth_method=managed_identity")

# Or use environment variable
os.environ["MLFLOW_POSTGRES_USE_MANAGED_IDENTITY"] = "true"
mlflow.set_tracking_uri("postgresql://user@server.postgres.database.azure.com:5432/mlflow")

# Your ML workflow continues normally
with mlflow.start_run():
    mlflow.log_param("model_type", "random_forest")
    mlflow.log_metric("accuracy", 0.92)

Google Cloud Storage for Artifacts

import mlflow

# Use GCS for artifact storage
mlflow.set_tracking_uri("postgresql://localhost/mlflow")
mlflow.create_experiment("my_experiment", artifact_location="gs://my-bucket/mlflow-artifacts")

# Log artifacts to GCS
with mlflow.start_run():
    mlflow.log_artifact("model.pkl")  # Automatically stored in GCS

Plugin System

# Enable ML framework plugins
from mlflow.plugins import get_plugin_manager

plugin_manager = get_plugin_manager()

# List available plugins
plugins = plugin_manager.list_plugins()
print("Available plugins:", [p["name"] for p in plugins])

# Enable PyTorch plugin
with plugin_manager.plugin_context("pytorch"):
    import mlflow.pytorch
    
    # Use PyTorch-specific functionality
    model = create_pytorch_model()
    mlflow.pytorch.log_model(model, "pytorch_model")

๐Ÿ—๏ธ Architecture

Storage Backends

Genesis-Flow supports multiple storage backends:

Backend Metadata Artifacts Use Case
File Store Local files Local files Development, testing
PostgreSQL PostgreSQL with Managed Identity Azure Blob/GCS/S3 Production, secure
SQL Database MySQL/SQLite Cloud storage Enterprise

Plugin Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Core MLflow   โ”‚    โ”‚  Plugin Manager  โ”‚    โ”‚  Framework      โ”‚
โ”‚   APIs          โ”‚โ—„โ”€โ”€โ–บโ”‚                  โ”‚โ—„โ”€โ”€โ–บโ”‚  Plugins        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚                       โ”‚                       โ”‚
         โ”‚                       โ”‚                       โ”‚
    โ”Œโ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”            โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚Security โ”‚            โ”‚ Lifecycle โ”‚         โ”‚ PyTorch       โ”‚
    โ”‚Validationโ”‚            โ”‚Management โ”‚         โ”‚ TensorFlow    โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜         โ”‚ Scikit-learn  โ”‚
                                                 โ”‚ Transformers  โ”‚
                                                 โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ”ง Configuration

Environment Variables

# Tracking configuration
export MLFLOW_TRACKING_URI="postgresql://user@server:5432/mlflow"
export MLFLOW_DEFAULT_ARTIFACT_ROOT="gs://my-bucket/mlflow"

# Default artifact location for all experiments
export MLFLOW_ARTIFACT_LOCATION="gs://my-bucket/mlflow-artifacts"

# PostgreSQL with Managed Identity
export MLFLOW_POSTGRES_USE_MANAGED_IDENTITY=true
export MLFLOW_POSTGRES_HOST="server.postgres.database.azure.com"
export MLFLOW_POSTGRES_DATABASE="mlflow"
export MLFLOW_POSTGRES_USERNAME="user@tenant"

# Google Cloud Storage configuration
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"

# Security configuration
export MLFLOW_STRICT_INPUT_VALIDATION=true

Configuration File

Create mlflow.conf:

[tracking]
uri = postgresql://user@server:5432/mlflow
default_artifact_root = gs://mlflow-artifacts/

[security]
enable_input_validation = true
enable_secure_model_loading = true
max_param_value_length = 6000

[plugins]
auto_discover = true
enable_builtin = false
plugin_paths = /path/to/custom/plugins

๐Ÿงช Testing

MLflow Compatibility Testing

Genesis-Flow provides 100% API compatibility with MLflow. Run comprehensive compatibility tests to verify all functionality works correctly with MongoDB backend:

# Run comprehensive MLflow compatibility test suite
python run_compatibility_tests.py

# Or run with pytest directly
pytest tests/integration/test_mlflow_compatibility.py -v

# Run specific test categories
pytest tests/integration/test_mlflow_compatibility.py::TestMLflowCompatibility::test_experiment_management -v
pytest tests/integration/test_mlflow_compatibility.py::TestChatModelCompatibility -v

Verified Compatible Features:

  • โœ… Experiment Management (create, list, search)
  • โœ… Run Lifecycle (start, end, delete, restore)
  • โœ… Parameter & Metric Logging (single, batch, history)
  • โœ… Tag Management (set, get, search)
  • โœ… Artifact Logging (JSON, text, tables, files)
  • โœ… Dataset Logging & Tracking
  • โœ… Model Logging (sklearn, pytorch, custom PyFunc)
  • โœ… Model Registry (register, version, stage transitions)
  • โœ… Search & Query Operations (filters, sorting)
  • โœ… ChatModel Support (OpenAI-compatible)
  • โœ… Batch Operations (bulk logging)
  • โœ… Error Handling & Edge Cases

Run All Tests

# Run core tests
pytest tests/

# Run integration tests
python tests/integration/test_full_integration.py

# Run performance tests
python tests/performance/load_test.py --tracking-uri file:///tmp/perf_test

# Run MongoDB compatibility tests (NEW)
pytest tests/integration/test_mongodb_compatibility.py

# Run comprehensive examples
cd examples/mongodb_integration
python 01_model_logging_example.py
python 02_model_registry_example.py
python 03_artifacts_datasets_example.py
python 04_complete_mlflow_workflow.py
python 05_chat_model_example.py

Validate Deployment

# Validate deployment configuration
python tools/deployment/validate_deployment.py \
    --tracking-uri mongodb://localhost:27017/mlflow_db \
    --artifact-root azure://container/artifacts

# Test MongoDB backend specifically
python run_compatibility_tests.py

# Validate with Azure Cosmos DB
python tools/deployment/validate_deployment.py \
    --tracking-uri "mongodb://account:key@account.mongo.cosmos.azure.com:10255/mlflow?ssl=true" \
    --artifact-root azure://container/artifacts

๐Ÿš€ Deployment

Local Development

# Start MLflow server
mlflow server \
    --backend-store-uri mongodb://localhost:27017/mlflow_db \
    --default-artifact-root azure://artifacts/ \
    --host 0.0.0.0 \
    --port 5000

Docker Deployment

FROM python:3.11-slim

WORKDIR /app
COPY . .

RUN pip install -e .

EXPOSE 5000

CMD ["mlflow", "server", \
     "--backend-store-uri", "mongodb://mongo:27017/mlflow", \
     "--default-artifact-root", "azure://artifacts/", \
     "--host", "0.0.0.0", \
     "--port", "5000"]

Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: genesis-flow
spec:
  replicas: 3
  selector:
    matchLabels:
      app: genesis-flow
  template:
    metadata:
      labels:
        app: genesis-flow
    spec:
      containers:
      - name: genesis-flow
        image: genesis-flow:latest
        ports:
        - containerPort: 5000
        env:
        - name: MLFLOW_TRACKING_URI
          value: "mongodb://mongo-service:27017/mlflow"
        - name: AZURE_STORAGE_CONNECTION_STRING
          valueFrom:
            secretKeyRef:
              name: azure-storage
              key: connection-string

๐Ÿ”„ Migration from MLflow

Migration Tool

# Analyze existing MLflow deployment
python tools/migration/mlflow_to_genesis_flow.py \
    --source-uri file:///old/mlruns \
    --target-uri mongodb://localhost:27017/genesis_flow \
    --analyze-only

# Perform migration
python tools/migration/mlflow_to_genesis_flow.py \
    --source-uri file:///old/mlruns \
    --target-uri mongodb://localhost:27017/genesis_flow \
    --include-artifacts

Manual Migration Steps

  1. Backup your data: Always backup existing MLflow data
  2. Install Genesis-Flow: Follow installation instructions
  3. Configure storage: Set up MongoDB and Azure Blob Storage
  4. Run migration tool: Use the provided migration scripts
  5. Validate deployment: Run deployment validation tests
  6. Update client code: No code changes required (100% compatible)

๐Ÿ”Œ Plugin Development

Creating Custom Plugins

from mlflow.plugins.base import FrameworkPlugin, PluginMetadata, PluginType

class MyFrameworkPlugin(FrameworkPlugin):
    def __init__(self):
        metadata = PluginMetadata(
            name="my_framework",
            version="1.0.0",
            description="Custom ML framework integration",
            author="Your Name",
            plugin_type=PluginType.FRAMEWORK,
            dependencies=["my_framework>=1.0.0"],
            optional_dependencies=["optional_package"],
            min_genesis_flow_version="3.1.0"
        )
        super().__init__(metadata)
    
    def get_module_path(self) -> str:
        return "mlflow.my_framework"
    
    def get_autolog_functions(self):
        return {"autolog": self._autolog_function}
    
    def get_save_functions(self):
        return {"save_model": self._save_model}
    
    def get_load_functions(self):
        return {"load_model": self._load_model}

Plugin Registration

# In setup.py or pyproject.toml
entry_points = {
    "mlflow.plugins": [
        "my_framework = my_package.mlflow_plugin:MyFrameworkPlugin"
    ]
}

๐Ÿ“Š Performance

Benchmarks

Operation Genesis-Flow Standard MLflow Improvement
Experiment Creation 50ms 75ms 33% faster
Run Logging 25ms 45ms 44% faster
Metric Search 100ms 200ms 50% faster
Model Loading 150ms 300ms 50% faster

Optimization Features

  • Lazy plugin loading reduces memory usage by 60%
  • MongoDB indexing improves search performance by 3x
  • Connection pooling reduces latency by 40%
  • Async operations support for high-throughput scenarios

๐Ÿ”’ Security

Security Features

  • โœ… Input validation against injection attacks
  • โœ… Path traversal protection for file operations
  • โœ… Authentication hooks for enterprise SSO integration
  • โœ… Audit logging for compliance requirements
  • โœ… Encrypted communication support

Security Best Practices

  1. Use MongoDB authentication in production
  2. Enable SSL/TLS for all connections
  3. Implement proper network segmentation
  4. Regular security audits and updates
  5. Monitor access logs for suspicious activity

๐Ÿค Contributing

Development Setup

# Clone repository
git clone https://github.com/your-org/genesis-flow.git
cd genesis-flow

# Install development dependencies
poetry install --with dev

# Install pre-commit hooks
pre-commit install

# Run tests
pytest tests/

Code Quality

# Format code
make format

# Run linters
make lint

# Run type checking
mypy mlflow/

# Run security scan
bandit -r mlflow/

๐Ÿ“š Documentation

๐Ÿ†˜ Support

Getting Help

  • GitHub Issues: Report bugs and request features
  • Documentation: Comprehensive guides and API docs
  • Community: Join our community discussions

Common Issues

Q: Plugin not loading? A: Check dependencies with pip list and ensure plugin is properly registered.

Q: MongoDB connection issues? A: Verify connection string, network access, and authentication credentials.

Q: Performance problems? A: Run performance tests and check MongoDB indexes. Consider connection pooling.

๐Ÿ“„ License

Genesis-Flow is licensed under the Apache License 2.0. See LICENSE for details.

๐Ÿ™ Acknowledgments

  • MLflow Community - For the excellent foundation
  • MongoDB - For scalable document storage
  • Azure - For cloud storage and compute services
  • Contributors - For making Genesis-Flow better

Genesis-Flow - Secure, Scalable, Enterprise-Ready ML Operations

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genesis_flow-1.0.9.tar.gz (1.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

genesis_flow-1.0.9-py3-none-any.whl (1.7 MB view details)

Uploaded Python 3

File details

Details for the file genesis_flow-1.0.9.tar.gz.

File metadata

  • Download URL: genesis_flow-1.0.9.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for genesis_flow-1.0.9.tar.gz
Algorithm Hash digest
SHA256 88bcc4b184afb564bb3f3ad0e213ac24ff62d0520c45e467c028f21594fedd7e
MD5 478d56992da8e16dba17a37704a2d439
BLAKE2b-256 a6eb111858af2d7c1d45acb476a753d4c36b33e3547b251263d095f2c3674dbc

See more details on using hashes here.

File details

Details for the file genesis_flow-1.0.9-py3-none-any.whl.

File metadata

  • Download URL: genesis_flow-1.0.9-py3-none-any.whl
  • Upload date:
  • Size: 1.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for genesis_flow-1.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 7fc7ef42dc8f268a60b68db26234a8a9780c3ae7e5178400e6fd19ec69e97d9a
MD5 8021a35c057ec2aa3e4f52dc1adb1930
BLAKE2b-256 be935090ebf163f97c1ea5417b89ced32bf9a08400594618b9082d79ea499fbb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page