cmind
Project description
Collective Mind toolkit (CM aka CK2)
Shortcuts
- Simple example to install and try our meta-framework for modular image classification
- Open workgroup developing automation of MLPerf and modularization of ML&AI Systems
- Demo to modularize MLPerf inference benchmark and automate submissions
- "Getting Started" tutorial
- Motivation and concept: journal article, ACM TechTalk
We are very glad that our technology has already helped different organizations automate their MLPerf inference benchmarking including Qualcomm, Krai, HPE, Dell, Alibaba and Lenovo - join our open workgroup to get involved in further developments!
Motivation
There are many great automation tools and workflow management frameworks - some are convenient for researchers and some for engineers. The Collective Mind toolkit (CM) is our community effort to develop a portable meta-framework that is convenient for both.
The goal of the CM framework is to help researchers and engineers wrap ad-hoc DevOps and MLOps automation scripts and artifacts with a simple, human-readable and platform-independent CLI, Python API and JSON/YAML meta description to make them more understandable, portable, reusable, interoperable, deterministic and reproducible across continuously changing hardware, software and data with minimal or no changes to existing projects.
Such wrappers can be automatically connected together into powerful and portable workflows, applications and web-services to abstract developers and scientists from the rapidly evolving world of technology.
See an example of a modular image classification assembled from such (portable CM scripts) that will automatically detect, download, install and build all related artifacts and tools to adapt this workflow to a user platform with Linux, Windows or MacOS:
python3 -m pip install cmind
cm pull repo mlcommons@ck
cm run script --tags=app,image-classification,onnx,python --quiet
or using Python scripting:
import cmind
r=cmind.access({'action':'run', 'automation':'script'
'tags':'app,image-classification,onnx,python,
'out':'con',
'quiet':True})
print (r)
It may take a few minutes to run this workflow for the first time and adapt it to your platform (depending on the Internet speed). Note that all the subsequent runs will be much faster because CM automatically caches the output of all portable CM scripts to be quickly reused in this and other CM workflows.
You can also force to install specific versions of ML artifacts (models, data sets, engines, libraries, tools, etc) using individual CM scripts to automatically plug them into the above ML task (see image classification dependencies using CM database of scripts):
cm run script --tags=detect,os --out=json
cm run script --tags=get,python --version_min=3.9.1
cm run script --tags=install,python-venv --name=my-virtual-env
cm run script --tags=get,ml-model-onnx,resnet50
cm run script --tags=get,dataset,imagenet,original,_2012-500
cm run script --tags=get,onnxruntime,python-lib --version=1.12.0
cm show cache
cm run script --tags=app,image-classification,onnx,python (--input=my-image.jpg)
A few more examples to detect compilers and CUDA devices on Windows:
cm run script --tags=get,cl --path="C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin"
cm run script --tags=get,cuda --path="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin"
cm show cache
cm run script --tags=get,cuda-devices
CM is motivated by our tedious and interesting experience reproducing 150+ ML and systems papers and validating them in the real world during different reproducibility initiatives and artifact evaluation.
The CM toolkit helps researchers and engineers transform their existing projects, Git repositories, Docker containers, Jupyter notebooks and internal directories into an open database of portable CM scripts with a common API, extensible meta descriptions and a simple portability and interoperability layer written in Python or shell scripts.
Such an evolutionary approach helps the community share their knowledge, experience, artifacts and scripts in a more unified, automated, portable, reusable and reproducible way while simplifying and automating the development and deployment of complex applications across rapidly evolving software and hardware stacks from the cloud to the edge.
The CM toolkit is the 2nd generation of the Collective Knowledge framework (CK) that was originally developed in collaboration with companies and universities to enable collaborative and reproducible development, optimization and deployment of Pareto-efficient ML Systems in terms of accuracy, latency, throughput, energy, size and costs across continuously changing software, hardware, user environments, settings, models and data.
Copyright
MLCommons 2022
News
-
2022 September 9: Subscribe to our public workgroup to participate in the development of automation workflows to simplify, modularize and automate ML Systems benchmarking.
-
2022 September 1: We have developed a CM workflow to automate and modularize MLPerf inference benchmark. We continue these developments within a public MLPerf education workgroup.
-
2022 July 25: We updated tutorial about CM scripts: https://github.com/mlcommons/ck/blob/master/cm/docs/tutorial-scripts.md .
-
2022 July 21: We have pre-released relatively stable scripts for portable DevOps and MLOps at https://github.com/mlcommons/ck/tree/master/cm-mlops/script .
-
2022 May 20: We brainstormed the minimal set of portable CM scripts to automate deployment of ML models across diverse hardware and software at OctoML in Seattle, WA.
-
2022 April 3: We presented our approach to bridge the growing gap between ML Systems research and production at the HPCA'22 workshop on benchmarking deep learning systems.
-
2022 March: We were invited to present our concept to enable collaborative and reproducible ML Systems R&D at the SIAM'22 workshop on "Research Challenges and Opportunities within Software Productivity, Sustainability, and Reproducibility"
-
2022 March: We have released the first prototype of the Collective Mind toolkit (aka CK2) based on your feedback and our practical experience reproducing 150+ ML and Systems papers and validating them in the real world.
Documentation
Tutorials
Community developments
CM core (database CLI and API)
We use GitHub tickets prefixed with [CK2/CM core] to improve and enhance the CM core that helps to organize projects as a collective database of reusable artifacts and automation scripts:
CM automation scripts
CM provides a common playground and a common language to help researchers and engineers discuss and learn how to connect numerous incompatible tools together and make them more deterministic, portable and reproducible across continuously changing software and hardware stacks. We continue these discussions and developments within our open workgroup:
Development meetings
Related resources
Contributing
The best way to contribute to this project is to join our open workgroup to help the community modularize AI, ML and other complex systems, share your ML artifacts and automations as reusable CM scripts and improve the core CM functionality.
References
- Journal article with CK/CM concepts and our long-term vision
- ACM TechTalk with CK/CM intro moderated by Peter Mattson (MLCommons president)
- HPCA'22 presentation "MLPerf design space exploration and production deployment"
Acknowledgments
We would like to thank MLCommons, OctoML, all contributors and collaborators for their support, fruitful discussions, and useful feedback! See more acknowledgments in the CK journal article and our ACM TechTalk.
Maintainers
- Grigori Fursin (CK&CM author)
- Arjun Suresh (author of CK and CM automation scripts for MLPerf)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.