Skip to main content

ZenML: Write production-ready ML code.

Project description

MLOps for Reliable AI - From Classical ML to Agents

Your unified toolkit for shipping everything from decision trees to complex AI agents, built on the MLOps principles you already trust.

FeaturesRoadmapReport BugSign up for ZenML ProBlogPodcast

🎉 For the latest release, see the release notes.


🚨 The Problem: MLOps Works for Models, But What About AI?

No MLOps for modern AI

You're an ML engineer. You've perfected deploying scikit-learn models and wrangling TensorFlow jobs. Your MLOps stack is dialed in. But now, you're being asked to build and ship AI agents, and suddenly your trusted toolkit is starting to crack.

  • The Adaptation Struggle: Your MLOps habits—rigorous testing, versioning, CI/CD—don’t map cleanly onto agent development. How do you version a prompt? How do you regression test a non-deterministic system? The tools that gave you confidence for models now create friction for agents.

  • The Divided Stack: To cope, teams are building a second, parallel stack just for LLM-based systems. Now you’re maintaining two sets of tools, two deployment pipelines, and two mental models. Your classical models live in one world, your agents in another. It's expensive, complex, and slows everyone down.

  • The Broken Feedback Loop: Getting an agent from your local environment to production is a slow, painful journey. By the time you get feedback on performance, cost, or quality, the requirements have already changed. Iteration is a guessing game, not a data-driven process.

💡 The Solution: One Framework for your Entire AI Stack

Stop maintaining two separate worlds. ZenML is a unified MLOps framework that extends the battle-tested principles you rely on for classical ML to the new world of AI agents. It’s one platform to develop, evaluate, and deploy your entire AI portfolio.

# Morning: Your sklearn pipeline is still versioned and reproducible.
train_and_deploy_classifier()

# Afternoon: Your new agent evaluation pipeline uses the same logic.
evaluate_and_deploy_agent()

# Same platform. Same principles. New possibilities.

With ZenML, you're not replacing your knowledge; you're extending it. Use the pipelines and practices you already know to version, test, deploy, and monitor everything from classic models to the most advanced agents.

💻 See It In Action: Multi-Agent Architecture Comparison

The Challenge: Your team built three different customer service agents. Which one should go to production? With ZenML, you can build a reproducible pipeline to test them on real data and make a data-driven decision.

from zenml import pipeline, step
import pandas as pd

@step
def load_real_conversations() -> pd.DataFrame:
    """Load actual customer queries from a feature store."""
    return load_from_feature_store("customer_queries_sample_1k")

@step
def run_architecture_comparison(queries: pd.DataFrame) -> dict:
    """Test three different agent architectures on the same data."""
    architectures = {
        "single_agent": SingleAgentRAG(),
        "multi_specialist": MultiSpecialistAgents(),
        "hierarchical": HierarchicalAgentTeam()
    }
    
    results = {}
    for name, agent in architectures.items():
        # ZenML automatically versions the agent's code, prompts, and tools
        results[name] = agent.batch_process(queries)
    return results

@step
def evaluate_and_decide(results: dict) -> str:
    """Evaluate results and generate a recommendation report."""
    # Compare architectures on quality, cost, latency, etc.
    evaluation_df = evaluate_results(results)
    
    # Generate a rich report comparing the architectures
    report = create_comparison_report(evaluation_df)
    
    # Automatically tag the winning architecture for a staging deployment
    winner = evaluation_df.sort_values("overall_score").iloc[0]
    tag_for_staging(winner["architecture_name"])
    
    return report

@pipeline
def compare_agent_architectures():
    """Your new Friday afternoon ritual: data-driven agent decisions."""
    queries = load_real_conversations()
    results = run_architecture_comparison(queries)
    report = evaluate_and_decide(results)

if __name__ == "__main__":
    # Run locally, compare results in the ZenML dashboard
    compare_agent_architectures()

The Result: A clear winner is selected based on data, not opinions. You have full lineage from the test data and agent versions to the final report and deployment decision.

🔄 The AI Development Lifecycle with ZenML

From Chaos to Process

Development lifecycle

Click to see your new, structured workflow

Your New Workflow

Monday: Quick Prototype

# Start with a local script, just like always
agent = LangGraphAgent(prompt="You are a helpful assistant...")
response = agent.chat("Help me with my order")

Tuesday: Make it a Pipeline

# Wrap your code in a ZenML step to make it reproducible
@step
def customer_service_agent(query: str) -> str:
    return agent.chat(query)

Wednesday: Add Evaluation

# Test on real data, not toy examples
@pipeline
def eval_pipeline():
    test_data = load_production_samples()
    responses = customer_service_agent.map(test_data)
    scores = evaluate_responses(responses)
    track_experiment(scores)

Thursday: Compare Architectures

# Make data-driven architecture decisions
results = compare_architectures(
    baseline="current_prod",
    challenger="new_multiagent_v2"
)

Friday: Ship with Confidence

# Deploy the new agent with the same command you use for ML models
zenml stack deploy agent-prod --model="customer_service:challenger"

🚀 Get Started (5 minutes)

For ML Engineers Ready to Tame AI

# You know this drill
pip install "zenml[llm]"  # Includes LangChain, LlamaIndex integrations

# Initialize (your ML pipelines still work!)
zenml init

# Pull our agent evaluation template
zenml init --template agent-evaluation-starter

Your First AI Pipeline

# look_familiar.py
from zenml import pipeline, step

@step
def run_my_agent(test_queries: list[str]) -> list[str]:
    """Your existing agent code, now with MLOps superpowers."""
    # Use ANY framework - LangGraph, CrewAI, raw OpenAI
    agent = YourExistingAgent()
    
    # Automatic versioning of prompts, tools, code, and configs
    return [agent.run(q) for q in test_queries]

@step
def evaluate_responses(queries: list[str], responses: list[str]) -> dict:
    """LLM judges + your custom business metrics."""
    quality = llm_judge(queries, responses)
    latency = measure_response_times()
    costs = calculate_token_usage()
    
    return {
        "quality": quality.mean(),
        "p95_latency": latency.quantile(0.95),
        "cost_per_query": costs.mean()
    }

@pipeline
def my_first_agent_pipeline():
    # Look ma, no YAML!
    queries = ["How do I return an item?", "What's your refund policy?"]
    responses = run_my_agent(queries)
    metrics = evaluate_responses(queries, responses)
    
    # Metrics are auto-logged, versioned, and comparable in the dashboard
    return metrics

if __name__ == "__main__":
    my_first_agent_pipeline()
    print("Check your dashboard: http://localhost:8080")

📚 Learn More

🖼️ Getting Started Resources

The best way to learn about ZenML is through our comprehensive documentation and tutorials:

For visual learners, start with this 11-minute introduction:

Introductory Youtube Video

📖 Production Examples

  1. E2E Batch Inference - Complete MLOps pipeline with feature engineering
  2. LLM RAG Pipeline - Production RAG with evaluation loops
  3. Agentic Workflow (Deep Research) - Orchestrate your agents with ZenML
  4. Fine-tuning Pipeline - Fine-tune and deploy LLMs

🏢 Deployment Options

For Teams:

  • Self-hosted - Deploy on your infrastructure with Helm/Docker
  • ZenML Pro - Managed service with enterprise support (free trial)

Infrastructure Requirements:

  • Kubernetes cluster (or local Docker)
  • Object storage (S3/GCS/Azure)
  • PostgreSQL database
  • Complete requirements

🎓 Books & Resources

ZenML is featured in these comprehensive guides to production AI systems.

🤝 Join ML Engineers Building the Future of AI

You're Not Alone:

Real Engineers, Real Stories:

"Same platform for our sklearn models and our RAG pipeline. DevOps loves us now."

  • ML Platform Lead, European Bank

"We went from 'YOLO prompt updates' to proper evaluation pipelines. Game changer."

  • Senior ML Engineer, Fortune 500 Retailer

"Finally, I can explain to my PM why agent v2 is actually worse than v1. With data!"

  • Staff Engineer, Series B Startup

Contribute:

Stay Updated:

  • 🗺 Public Roadmap - See what's coming next
  • 📰 Blog - Best practices and case studies
  • 🎙 Podcast - Interviews with ML practitioners

❓ FAQs from ML Engineers Like You

Q: "Do I need to rewrite my agents or models to use ZenML?" A: No. Wrap your existing code in a @step. Keep using Scikit-Learn, PyTorch, LangGraph, LlamaIndex, or raw API calls. ZenML orchestrates your tools, it doesn't replace them.

Q: "How is this different from LangSmith/Langfuse?" A: They provide excellent observability for LLM applications. We orchestrate the full MLOps lifecycle for your entire AI stack. With ZenML, you manage both your classical ML models and your AI agents in one unified framework, from development and evaluation all the way to production deployment.

Q: "Can I use my existing MLflow/W&B setup?" A: Yes! We integrate with both. Your experiments, our pipelines.

Q: "Is this just MLflow with extra steps?" A: No. MLflow tracks experiments. We orchestrate the entire development process – from training and evaluation to deployment and monitoring – for both models and agents.

Q: "What about cost? I can't afford another platform." A: ZenML's open-source version is free forever. You likely already have the required infrastructure (like a Kubernetes cluster and object storage). We just help you make better use of it for MLOps.

🛠 VS Code Extension

Manage pipelines directly from your editor:

🖥️ VS Code Extension in Action!
ZenML Extension

Install from VS Code Marketplace.

📜 License

ZenML is distributed under the terms of the Apache License Version 2.0. See LICENSE for details.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zenml_nightly-0.83.1.dev20250710.tar.gz (3.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zenml_nightly-0.83.1.dev20250710-py3-none-any.whl (4.5 MB view details)

Uploaded Python 3

File details

Details for the file zenml_nightly-0.83.1.dev20250710.tar.gz.

File metadata

  • Download URL: zenml_nightly-0.83.1.dev20250710.tar.gz
  • Upload date:
  • Size: 3.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: poetry/2.1.3 CPython/3.9.23 Linux/6.11.0-1015-azure

File hashes

Hashes for zenml_nightly-0.83.1.dev20250710.tar.gz
Algorithm Hash digest
SHA256 bb88919bc976cd9ed20c38c4ab2e95be7104424493e428e5dc01506f50de3953
MD5 b0c782ffeeda8b68e521468c15985eb1
BLAKE2b-256 ef84a43ca1ef93454b977a637abb57ca0bfe5760182a40cc55110b5e5273259a

See more details on using hashes here.

File details

Details for the file zenml_nightly-0.83.1.dev20250710-py3-none-any.whl.

File metadata

File hashes

Hashes for zenml_nightly-0.83.1.dev20250710-py3-none-any.whl
Algorithm Hash digest
SHA256 71fc2ac67897f9f5579868369fecc625839999951279e602699b75be42110434
MD5 e91c0f8a1e96cd5669a6a36c810791ca
BLAKE2b-256 85576d19a56e001611a8d3aeda550c48a4c1012894d39e1dfbf894167b7ffea3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page