Skip to main content

Track metadata for AI pipeline

Project description

Common Metadata Framework (CMF)

Deploy Docs PyPI version Docs License

Common Metadata Framework (CMF) is a metadata tracking and versioning system for ML pipelines. It tracks code, data, and pipeline metrics—offering Git-like metadata management across distributed environments.


🚀 Features

  • ✅ Track artifacts (datasets, models, metrics) using content-based hashes
  • ✅ Automatically logs code versions (Git) and data versions (DVC)
  • ✅ Push/pull metadata via CLI across distributed sites
  • ✅ REST API for direct server interaction
  • ✅ Implicit & explicit tracking of pipeline execution
  • ✅ Fine-grained or coarse-grained metric logging

🏛 Quick Start

Get started with CMF in minutes using our example ML pipeline:

📖 Try the Getting Started Example

This example demonstrates:

  • Initializing a CMF project
  • Tracking an ML pipeline with multiple stages (parse → featurize → train → test)
  • Versioning datasets and models
  • Pushing artifacts and metadata
  • Querying tracked metadata

📦 Installation

Requirements

  • Linux/Ubuntu/Debian
  • Python: Version 3.9 to 3.11 (3.10 recommended)
  • Git (latest)

Virtual Environment

Conda
conda create -n cmf python=3.10
conda activate cmf
Virtualenv
virtualenv --python=3.10 .cmf
source .cmf/bin/activate

Install CMF

Latest from GitHub
pip install git+https://github.com/HewlettPackard/cmf
Stable from PyPI
pip install cmflib

Server Setup

📖 Follow the CMF Server Installation Guide


📘 Documentation


🧠 How It Works

CMF tracks pipeline stages, inputs/outputs, metrics, and code. It supports decentralized execution across datacenters, edge, and cloud.

  • Artifacts are versioned using DVC (.dvc files).
  • Code is tracked with Git.
  • Metadata is logged to relational DB (e.g., SQLite, PostgreSQL)
  • Sync metadata with cmf metadata push and cmf metadata pull.

🏛 Architecture

CMF is composed of:

  • CMFLib - Metadata library provides API to log/query metadata
  • CMF Client – CLI to sync metadata with server, push/pull artifacts to the user-specified repo, push/pull code from Git
  • CMF Server – REST API for metadata merge
  • Central Repositories – Git (code), DVC (artifacts), CMF (metadata)


🔧 Sample Usage

from cmflib.cmf import Cmf
from ml_metadata.proto import metadata_store_pb2 as mlpb

cmf = Cmf(filepath="mlmd", pipeline_name="test_pipeline")

context: mlpb.Context = cmf.create_context(
    pipeline_stage="prepare",
    custom_properties={"user-metadata1": "metadata_value"}
)

execution: mlpb.Execution = cmf.create_execution(
    execution_type="Prepare",
    custom_properties={"split": split, "seed": seed}
)

artifact: mlpb.Artifact = cmf.log_dataset(
    "artifacts/data.xml.gz", "input",
    custom_properties={"user-metadata1": "metadata_value"}
)
cmf                          # CLI to manage metadata and artifacts
cmf init                     # Initialize artifact repository
cmf init show                # Show current CMF config
cmf metadata push            # Push metadata to server
cmf metadata pull            # Pull metadata from server

➡️ For the complete list of commands, please refer to the Command Reference


✅ Benefits

  • Full ML pipeline observability
  • Unified metadata, artifact, and code tracking
  • Scalable metadata syncing
  • Team collaboration on metadata

🎤 Talks & Publications


🌐 Related Projects


🤝 Community


📄 License

Licensed under the Apache 2.0 License


© Hewlett Packard Enterprise. Built for reproducibility in ML.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cmflib-0.0.97.tar.gz (125.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cmflib-0.0.97-py3-none-any.whl (173.8 kB view details)

Uploaded Python 3

File details

Details for the file cmflib-0.0.97.tar.gz.

File metadata

  • Download URL: cmflib-0.0.97.tar.gz
  • Upload date:
  • Size: 125.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.14

File hashes

Hashes for cmflib-0.0.97.tar.gz
Algorithm Hash digest
SHA256 682b8acba168f6ed218c7f4cb1e9a6e4e509a8818602dfe5fa6057299c0c0381
MD5 e9abc4d15f0d47a02e4a72d8487d4173
BLAKE2b-256 f7e666d576a22e92982bffd102196a746622438b09626fa918d4949e11c8c378

See more details on using hashes here.

File details

Details for the file cmflib-0.0.97-py3-none-any.whl.

File metadata

  • Download URL: cmflib-0.0.97-py3-none-any.whl
  • Upload date:
  • Size: 173.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.14

File hashes

Hashes for cmflib-0.0.97-py3-none-any.whl
Algorithm Hash digest
SHA256 1a480c268b0987daf07724030707a711f338fc2fc10137391da9a9e60c12b466
MD5 dd571f6271cc3f7b40309f24ab692d14
BLAKE2b-256 53f89dd667f0054b5d5725f15dbce951970714833d04788831df9c750cfca625

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page