Skip to main content

Track metadata for AI pipeline

Project description

Common Metadata Framework (CMF)

Deploy Docs PyPI version Docs License

Common Metadata Framework (CMF) is a metadata tracking and versioning system for ML pipelines. It tracks code, data, and pipeline metrics—offering Git-like metadata management across distributed environments.


🚀 Features

  • ✅ Track artifacts (datasets, models, metrics) using content-based hashes
  • ✅ Automatically logs code versions (Git) and data versions (DVC)
  • ✅ Push/pull metadata via CLI across distributed sites
  • ✅ REST API for direct server interaction
  • ✅ Implicit & explicit tracking of pipeline execution
  • ✅ Fine-grained or coarse-grained metric logging

🏛 Quick Start

Get started with CMF in minutes using our example ML pipeline:

📖 Try the Getting Started Example

This example demonstrates:

  • Initializing a CMF project
  • Tracking an ML pipeline with multiple stages (parse → featurize → train → test)
  • Versioning datasets and models
  • Pushing artifacts and metadata
  • Querying tracked metadata

📦 Installation

Requirements

  • Linux/Ubuntu/Debian
  • Python: Version 3.9 to 3.11 (3.10 recommended)
  • Git (latest)

Virtual Environment

Conda
conda create -n cmf python=3.10
conda activate cmf
Virtualenv
virtualenv --python=3.10 .cmf
source .cmf/bin/activate

Install CMF

Latest from GitHub
pip install git+https://github.com/HewlettPackard/cmf
Stable from PyPI
pip install cmflib

Server Setup

📖 Follow the CMF Server Installation Guide


📘 Documentation


🧠 How It Works

CMF tracks pipeline stages, inputs/outputs, metrics, and code. It supports decentralized execution across datacenters, edge, and cloud.

  • Artifacts are versioned using DVC (.dvc files).
  • Code is tracked with Git.
  • Metadata is logged to relational DB (e.g., SQLite, PostgreSQL)
  • Sync metadata with cmf metadata push and cmf metadata pull.

🏛 Architecture

CMF is composed of:

  • cmflib - Metadata library provides API to log/query metadata
  • CMF Client – CLI to sync metadata with server, push/pull artifacts to the user-specified repo, push/pull code from Git
  • CMF Server – REST API for metadata merge
  • Central Repositories – Git (code), DVC (artifacts), CMF (metadata)


🔧 Sample Usage

from cmflib.cmf import Cmf
from ml_metadata.proto import metadata_store_pb2 as mlpb

metawriter = Cmf(filepath="mlmd", pipeline_name="test_pipeline")

context: mlpb.Context = metawriter.create_context(
    pipeline_stage="prepare",
    custom_properties={"user-metadata1": "metadata_value"}
)

execution: mlpb.Execution = metawriter.create_execution(
    execution_type="Prepare",
    custom_properties={"split": split, "seed": seed}
)

artifact: mlpb.Artifact = metawriter.log_dataset(
    "artifacts/data.xml.gz", "input",
    custom_properties={"user-metadata1": "metadata_value"}
)
cmf                          # CLI to manage metadata and artifacts
cmf init                     # Initialize artifact repository
cmf init show                # Show current CMF config
cmf metadata push            # Push metadata to server
cmf metadata pull            # Pull metadata from server

➡️ For the complete list of commands, please refer to the Command Reference


✅ Benefits

  • Full ML pipeline observability
  • Unified metadata, artifact, and code tracking
  • Scalable metadata syncing
  • Team collaboration on metadata

🎤 Talks & Publications


🌐 Related Projects


🤝 Community


📄 License

Licensed under the Apache 2.0 License


© Hewlett Packard Enterprise. Built for reproducibility in ML.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cmflib-0.0.98.tar.gz (128.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cmflib-0.0.98-py3-none-any.whl (177.3 kB view details)

Uploaded Python 3

File details

Details for the file cmflib-0.0.98.tar.gz.

File metadata

  • Download URL: cmflib-0.0.98.tar.gz
  • Upload date:
  • Size: 128.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.14

File hashes

Hashes for cmflib-0.0.98.tar.gz
Algorithm Hash digest
SHA256 47b0bd70fa6e935a239964ba3c4d5bffed1027866c5814df64ace1f5cc782c29
MD5 5fbd67e20f50ad5f3099f3f93683ed48
BLAKE2b-256 4092996aae3ac3b5cb70975fd913a484c0444d9e646540fa0755c62d8540d78a

See more details on using hashes here.

File details

Details for the file cmflib-0.0.98-py3-none-any.whl.

File metadata

  • Download URL: cmflib-0.0.98-py3-none-any.whl
  • Upload date:
  • Size: 177.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.14

File hashes

Hashes for cmflib-0.0.98-py3-none-any.whl
Algorithm Hash digest
SHA256 629867e986b13f3b81eb0e839df0fc5f9edd52d38bfbdac3758eb47881a646a1
MD5 52d890f12f71e233026de42cf91d7e4c
BLAKE2b-256 6af2bbf566a719769fa8b37d59c23015fd7981d320e90618f195f0baf8de3854

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page