Skip to main content

cmind

Project description

Collective Mind toolkit (CM aka CK2)

PyPI version Downloads Python Version License

Documentation CM(CK2) test

Shortcuts

We are very glad that our technology has already helped different organizations automate their MLPerf inference benchmarking including Qualcomm, Krai, HPE, Dell, Alibaba and Lenovo - join our open workgroup to get involved in further developments!

Motivation

There are many great automation tools and workflow management frameworks - some are convenient for researchers and some for engineers. The Collective Mind toolkit (CM) is our community effort to develop a portable meta-framework that is convenient for both.

The goal of the CM framework is to help researchers and engineers wrap ad-hoc DevOps and MLOps automation scripts and artifacts with a simple, human-readable and platform-independent CLI, Python API and JSON/YAML meta description to make them more understandable, portable, reusable, interoperable, deterministic and reproducible across continuously changing hardware, software and data with minimal or no changes to existing projects.

Such wrappers can be automatically connected together into powerful and portable workflows, applications and web-services to abstract developers and scientists from the rapidly evolving world of technology.

See an example of a modular image classification assembled from such (portable CM scripts) that will automatically detect, download, install and build all related artifacts and tools to adapt this workflow to a user platform with Linux, Windows or MacOS:

python3 -m pip install cmind

cm pull repo mlcommons@ck

cm run script --tags=app,image-classification,onnx,python --quiet

or using Python scripting:

import cmind
r=cmind.access({'action':'run', 'automation':'script'
                'tags':'app,image-classification,onnx,python,
                'out':'con',
                'quiet':True})
print (r)

It may take a few minutes to run this workflow for the first time and adapt it to your platform (depending on the Internet speed). Note that all the subsequent runs will be much faster because CM automatically caches the output of all portable CM scripts to be quickly reused in this and other CM workflows.

You can also force to install specific versions of ML artifacts (models, data sets, engines, libraries, tools, etc) using individual CM scripts to automatically plug them into the above ML task (see image classification dependencies using CM database of scripts):

cm run script --tags=detect,os --out=json
cm run script --tags=get,python --version_min=3.9.1
cm run script --tags=install,python-venv --name=my-virtual-env
cm run script --tags=get,ml-model-onnx,resnet50
cm run script --tags=get,dataset,imagenet,original,_2012-500
cm run script --tags=get,onnxruntime,python-lib --version=1.12.0

cm show cache

cm run script --tags=app,image-classification,onnx,python (--input=my-image.jpg)

A few more examples to detect compilers and CUDA devices on Windows:

cm run script --tags=get,cl --path="C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin"
cm run script --tags=get,cuda --path="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin"

cm show cache

cm run script --tags=get,cuda-devices

CM is motivated by our tedious and interesting experience reproducing 150+ ML and systems papers and validating them in the real world during different reproducibility initiatives and artifact evaluation.

The CM toolkit helps researchers and engineers transform their existing projects, Git repositories, Docker containers, Jupyter notebooks and internal directories into an open database of portable CM scripts with a common API, extensible meta descriptions and a simple portability and interoperability layer written in Python or shell scripts.

Such an evolutionary approach helps the community share their knowledge, experience, artifacts and scripts in a more unified, automated, portable, reusable and reproducible way while simplifying and automating the development and deployment of complex applications across rapidly evolving software and hardware stacks from the cloud to the edge.

The CM toolkit is the 2nd generation of the Collective Knowledge framework (CK) that was originally developed in collaboration with companies and universities to enable collaborative and reproducible development, optimization and deployment of Pareto-efficient ML Systems in terms of accuracy, latency, throughput, energy, size and costs across continuously changing software, hardware, user environments, settings, models and data.

Copyright

MLCommons 2022

News

Documentation

Tutorials

Community developments

CM core (database CLI and API)

We use GitHub tickets prefixed with [CK2/CM core] to improve and enhance the CM core that helps to organize projects as a collective database of reusable artifacts and automation scripts:

CM automation scripts

CM provides a common playground and a common language to help researchers and engineers discuss and learn how to connect numerous incompatible tools together and make them more deterministic, portable and reproducible across continuously changing software and hardware stacks. We continue these discussions and developments within our open workgroup:

Development meetings

Related resources

Contributing

The best way to contribute to this project is to join our open workgroup to help the community modularize AI, ML and other complex systems, share your ML artifacts and automations as reusable CM scripts and improve the core CM functionality.

References

Acknowledgments

We would like to thank MLCommons, OctoML, all contributors and collaborators for their support, fruitful discussions, and useful feedback! See more acknowledgments in the CK journal article and our ACM TechTalk.

Maintainers

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cmind-1.0.1.tar.gz (42.9 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page