Skip to main content

Track metadata for AI pipeline

Project description

Common Metadata Framework (CMF)

Deploy Docs PyPI version Docs License

Common Metadata Framework (CMF) is a metadata tracking and versioning system for ML pipelines. It tracks code, data, and pipeline metrics—offering Git-like metadata management across distributed environments.


🚀 Features

  • ✅ Track artifacts (datasets, models, metrics) using content-based hashes
  • ✅ Automatically logs code versions (Git) and data versions (DVC)
  • ✅ Push/pull metadata via CLI across distributed sites
  • ✅ REST API for direct server interaction
  • ✅ Implicit & explicit tracking of pipeline execution
  • ✅ Fine-grained or coarse-grained metric logging

🏛 Quick Start

Get started with CMF in minutes using our example ML pipeline:

📖 Try the Getting Started Example

This example demonstrates:

  • Initializing a CMF project
  • Tracking an ML pipeline with multiple stages (parse → featurize → train → test)
  • Versioning datasets and models
  • Pushing artifacts and metadata
  • Querying tracked metadata

📦 Installation

Requirements

  • Linux/Ubuntu/Debian
  • Python: Version 3.9 to 3.11 (3.10 recommended)
  • Git (latest)

Virtual Environment

Conda
conda create -n cmf python=3.10
conda activate cmf
Virtualenv
virtualenv --python=3.10 .cmf
source .cmf/bin/activate

Install CMF

Latest from GitHub
pip install git+https://github.com/HewlettPackard/cmf
Stable from PyPI
pip install cmflib

Server Setup

📖 Follow the CMF Server Installation Guide


📘 Documentation


🧠 How It Works

CMF tracks pipeline stages, inputs/outputs, metrics, and code. It supports decentralized execution across datacenters, edge, and cloud.

  • Artifacts are versioned using DVC (.dvc files).
  • Code is tracked with Git.
  • Metadata is logged to relational DB (e.g., SQLite, PostgreSQL)
  • Sync metadata with cmf metadata push and cmf metadata pull.

🏛 Architecture

CMF is composed of:

  • cmflib - Metadata library provides API to log/query metadata
  • CMF Client – CLI to sync metadata with server, push/pull artifacts to the user-specified repo, push/pull code from Git
  • CMF Server – REST API for metadata merge
  • Central Repositories – Git (code), DVC (artifacts), CMF (metadata)


🔧 Sample Usage

from cmflib.cmf import Cmf
from ml_metadata.proto import metadata_store_pb2 as mlpb

metawriter = Cmf(filepath="mlmd", pipeline_name="test_pipeline")

context: mlpb.Context = metawriter.create_context(
    pipeline_stage="prepare",
    custom_properties={"user-metadata1": "metadata_value"}
)

execution: mlpb.Execution = metawriter.create_execution(
    execution_type="Prepare",
    custom_properties={"split": split, "seed": seed}
)

artifact: mlpb.Artifact = metawriter.log_dataset(
    "artifacts/data.xml.gz", "input",
    custom_properties={"user-metadata1": "metadata_value"}
)
cmf                          # CLI to manage metadata and artifacts
cmf init                     # Initialize artifact repository
cmf init show                # Show current CMF config
cmf metadata push            # Push metadata to server
cmf metadata pull            # Pull metadata from server

➡️ For the complete list of commands, please refer to the Command Reference


✅ Benefits

  • Full ML pipeline observability
  • Unified metadata, artifact, and code tracking
  • Scalable metadata syncing
  • Team collaboration on metadata

🎤 Talks & Publications


🌐 Related Projects


🤝 Community


📄 License

Licensed under the Apache 2.0 License


© Hewlett Packard Enterprise. Built for reproducibility in ML.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cmflib-0.0.99.tar.gz (131.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cmflib-0.0.99-py3-none-any.whl (181.7 kB view details)

Uploaded Python 3

File details

Details for the file cmflib-0.0.99.tar.gz.

File metadata

  • Download URL: cmflib-0.0.99.tar.gz
  • Upload date:
  • Size: 131.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.14

File hashes

Hashes for cmflib-0.0.99.tar.gz
Algorithm Hash digest
SHA256 e0905e758e065bf1defdea185d4b9946ecc51fd5aff8ac17afa83becd0584db0
MD5 31d70d0a066aec665d8add166902c4c7
BLAKE2b-256 a072bb2e23263f160e814c053b2f51192fc3f7d6017c5fcfb4d3a0f74670d2b5

See more details on using hashes here.

File details

Details for the file cmflib-0.0.99-py3-none-any.whl.

File metadata

  • Download URL: cmflib-0.0.99-py3-none-any.whl
  • Upload date:
  • Size: 181.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.14

File hashes

Hashes for cmflib-0.0.99-py3-none-any.whl
Algorithm Hash digest
SHA256 d60c1b12bea775fee83d568bb7fe38635f07a64758a7e6f7655d46ec75d1cd10
MD5 74b36fe8c51f50a0a3bb8b85999c2d73
BLAKE2b-256 9d917fe3ceeba2d8ad367e192cb69b21682451677a0d944627fa0ab149a25bed

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page