Track metadata for AI pipeline
Project description
Common Metadata Framework (CMF)
Common Metadata Framework (CMF) is a metadata tracking and versioning system for ML pipelines. It tracks code, data, and pipeline metrics—offering Git-like metadata management across distributed environments.
🚀 Features
- ✅ Track artifacts (datasets, models, metrics) using content-based hashes
- ✅ Automatically logs code versions (Git) and data versions (DVC)
- ✅ Push/pull metadata via CLI across distributed sites
- ✅ REST API for direct server interaction
- ✅ Implicit & explicit tracking of pipeline execution
- ✅ Fine-grained or coarse-grained metric logging
🏛 Quick Start
Get started with CMF in minutes using our example ML pipeline:
📖 Try the Getting Started Example
This example demonstrates:
- Initializing a CMF project
- Tracking an ML pipeline with multiple stages (parse → featurize → train → test)
- Versioning datasets and models
- Pushing artifacts and metadata
- Querying tracked metadata
📦 Installation
Requirements
- Linux/Ubuntu/Debian
- Python: Version 3.9 to 3.11 (3.10 recommended)
- Git (latest)
Virtual Environment
Conda
conda create -n cmf python=3.10
conda activate cmf
Virtualenv
virtualenv --python=3.10 .cmf
source .cmf/bin/activate
Install CMF
Latest from GitHub
pip install git+https://github.com/HewlettPackard/cmf
Stable from PyPI
pip install cmflib
Server Setup
📖 Follow the CMF Server Installation Guide
📘 Documentation
🧠 How It Works
CMF tracks pipeline stages, inputs/outputs, metrics, and code. It supports decentralized execution across datacenters, edge, and cloud.
- Artifacts are versioned using DVC (
.dvcfiles). - Code is tracked with Git.
- Metadata is logged to relational DB (e.g., SQLite, PostgreSQL)
- Sync metadata with
cmf metadata pushandcmf metadata pull.
🏛 Architecture
CMF is composed of:
- cmflib - Metadata library provides API to log/query metadata
- CMF Client – CLI to sync metadata with server, push/pull artifacts to the user-specified repo, push/pull code from Git
- CMF Server – REST API for metadata merge
- Central Repositories – Git (code), DVC (artifacts), CMF (metadata)
🔧 Sample Usage
from cmflib.cmf import Cmf
from ml_metadata.proto import metadata_store_pb2 as mlpb
metawriter = Cmf(filepath="mlmd", pipeline_name="test_pipeline")
context: mlpb.Context = metawriter.create_context(
pipeline_stage="prepare",
custom_properties={"user-metadata1": "metadata_value"}
)
execution: mlpb.Execution = metawriter.create_execution(
execution_type="Prepare",
custom_properties={"split": split, "seed": seed}
)
artifact: mlpb.Artifact = metawriter.log_dataset(
"artifacts/data.xml.gz", "input",
custom_properties={"user-metadata1": "metadata_value"}
)
cmf # CLI to manage metadata and artifacts
cmf init # Initialize artifact repository
cmf init show # Show current CMF config
cmf metadata push # Push metadata to server
cmf metadata pull # Pull metadata from server
➡️ For the complete list of commands, please refer to the Command Reference
✅ Benefits
- Full ML pipeline observability
- Unified metadata, artifact, and code tracking
- Scalable metadata syncing
- Team collaboration on metadata
🎤 Talks & Publications
🌐 Related Projects
🤝 Community
- 💬 Join CMF on Slack
- 📧 Contact: annmary.roy@hpe.com
📄 License
Licensed under the Apache 2.0 License
© Hewlett Packard Enterprise. Built for reproducibility in ML.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cmflib-0.0.98.tar.gz.
File metadata
- Download URL: cmflib-0.0.98.tar.gz
- Upload date:
- Size: 128.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
47b0bd70fa6e935a239964ba3c4d5bffed1027866c5814df64ace1f5cc782c29
|
|
| MD5 |
5fbd67e20f50ad5f3099f3f93683ed48
|
|
| BLAKE2b-256 |
4092996aae3ac3b5cb70975fd913a484c0444d9e646540fa0755c62d8540d78a
|
File details
Details for the file cmflib-0.0.98-py3-none-any.whl.
File metadata
- Download URL: cmflib-0.0.98-py3-none-any.whl
- Upload date:
- Size: 177.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
629867e986b13f3b81eb0e839df0fc5f9edd52d38bfbdac3758eb47881a646a1
|
|
| MD5 |
52d890f12f71e233026de42cf91d7e4c
|
|
| BLAKE2b-256 |
6af2bbf566a719769fa8b37d59c23015fd7981d320e90618f195f0baf8de3854
|