Skip to main content

Track metadata for AI pipeline

Project description

Common Metadata Framework (CMF)

Deploy Docs PyPI version Docs License

Common Metadata Framework (CMF) is a metadata tracking and versioning system for ML pipelines. It tracks code, data, and pipeline metrics—offering Git-like metadata management across distributed environments.


🚀 Features

  • ✅ Track artifacts (datasets, models, metrics) using content-based hashes
  • ✅ Automatically logs code versions (Git) and data versions (DVC)
  • ✅ Push/pull metadata via CLI across distributed sites
  • ✅ REST API for direct server interaction
  • ✅ Implicit & explicit tracking of pipeline execution
  • ✅ Fine-grained or coarse-grained metric logging

🏛 Quick Start

Get started with CMF in minutes using our example ML pipeline:

📖 Try the Getting Started Example

This example demonstrates:

  • Initializing a CMF project
  • Tracking an ML pipeline with multiple stages (parse → featurize → train → test)
  • Versioning datasets and models
  • Pushing artifacts and metadata
  • Querying tracked metadata

📦 Installation

Requirements

  • Linux/Ubuntu/Debian
  • Python: Version 3.9 to 3.11 (3.10 recommended)
  • Git (latest)

Virtual Environment

Conda
conda create -n cmf python=3.10
conda activate cmf
Virtualenv
virtualenv --python=3.10 .cmf
source .cmf/bin/activate

Install CMF

Latest from GitHub
pip install git+https://github.com/HewlettPackard/cmf
Stable from PyPI
pip install cmflib

Server Setup

📖 Follow the CMF Server Installation Guide


📘 Documentation


🧠 How It Works

CMF tracks pipeline stages, inputs/outputs, metrics, and code. It supports decentralized execution across datacenters, edge, and cloud.

  • Artifacts are versioned using DVC (.dvc files).
  • Code is tracked with Git.
  • Metadata is logged to relational DB (e.g., SQLite, PostgreSQL)
  • Sync metadata with cmf metadata push and cmf metadata pull.

🏛 Architecture

CMF is composed of:

  • cmflib - Metadata library provides API to log/query metadata
  • CMF Client – CLI to sync metadata with server, push/pull artifacts to the user-specified repo, push/pull code from Git
  • CMF Server – REST API for metadata merge
  • Central Repositories – Git (code), DVC (artifacts), CMF (metadata)


🔧 Sample Usage

from cmflib.cmf import Cmf
from ml_metadata.proto import metadata_store_pb2 as mlpb

metawriter = Cmf(filepath="mlmd", pipeline_name="test_pipeline")

context: mlpb.Context = metawriter.create_context(
    pipeline_stage="prepare",
    custom_properties={"user-metadata1": "metadata_value"}
)

execution: mlpb.Execution = metawriter.create_execution(
    execution_type="Prepare",
    custom_properties={"split": split, "seed": seed}
)

artifact: mlpb.Artifact = metawriter.log_dataset(
    "artifacts/data.xml.gz", "input",
    custom_properties={"user-metadata1": "metadata_value"}
)
cmf                          # CLI to manage metadata and artifacts
cmf init                     # Initialize artifact repository
cmf init show                # Show current CMF config
cmf metadata push            # Push metadata to server
cmf metadata pull            # Pull metadata from server

➡️ For the complete list of commands, please refer to the Command Reference


✅ Benefits

  • Full ML pipeline observability
  • Unified metadata, artifact, and code tracking
  • Scalable metadata syncing
  • Team collaboration on metadata

🎤 Talks & Publications


🌐 Related Projects


🤝 Community


📄 License

Licensed under the Apache 2.0 License


© Hewlett Packard Enterprise. Built for reproducibility in ML.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cmflib-0.1.0.tar.gz (139.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cmflib-0.1.0-py3-none-any.whl (193.7 kB view details)

Uploaded Python 3

File details

Details for the file cmflib-0.1.0.tar.gz.

File metadata

  • Download URL: cmflib-0.1.0.tar.gz
  • Upload date:
  • Size: 139.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.14

File hashes

Hashes for cmflib-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d598d3632f0ff74dd338f7b165110b9ccf4cedcdacd8f6ee6fedb5faf8e40f27
MD5 745a9d0ebc6854a60a30c13e5c9deacc
BLAKE2b-256 3fb7255e606438a65a200e488d7ef2f4b4244945f1842295a852497bd2c4aa3c

See more details on using hashes here.

File details

Details for the file cmflib-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: cmflib-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 193.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.14

File hashes

Hashes for cmflib-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 92eecc51ab989b4113efd11d448f678e401811cd73c0674703c631a7a6f10167
MD5 b6638f0ab6dbbfc2aa1da352d1c30eb4
BLAKE2b-256 3826debad01cf2fa63f61a181895dc1daafa6266de6618b63782531a3b4612f1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page