Machine learning tools for computational chemistry and condensed matter physics
"a kit for camels"
cmlkit provides a clean and concise way to specify, tune, and evaluate machine learning models for computational chemistry and condensed matter physics, particularly for atomistic predictions.
qmmlpack, which is not yet publicly available.
- This is a "scientific code", i.e. development occurs infrequently and somewhat haphazardly. I'll try to not make breaking changes too often, and never in minor versions.
- This is very domain-specific project, so it is somewhat full of jargon. The
enginesub-modules are quite general, though!
If you use this code in any scientific work, please mention it in the publication and let me know. Thanks! 🐫
At its core,
cmlkit defines a unified
dict-based format to specify model components, which can be straightforwardly read and written as
yaml. It provides interfaces to implementations of popular methods in its domain using this format. Model components are implemented as pure-ish functions, which is conceptually satisfying and opens the door to easy pipelining and caching.
On this basis, it then implements parallel hyperparameter optimisation (using
hyperopt as backend), and provides tools to train models, make predictions, and evaluate those predictions. It is intended to be extensible and flexible enough for the demands of research. It is also "high-performance computing compatible", i.e. it can run in computing environments straight from the 90s. 🤓
Out of necessity, it also implements yet another dataset format, but makes up for it by providing automatic loading, which is neat.
At the moment, there are interfaces for:
- Many-Body Tensor Representation (MBTR) (Huo, Rupp, arXiv 1704.06439 (2017)) (
- Smooth Overlap of Atomic Positions (SOAP) representaton (Bartok, Kondor, Csanyi, PRB 87, 184115 (2013)) (
- Symmetry Functions (SF) representation (Behler, JCP 134, 074106 (2011)) (
- Kernel Ridge Regression (KRR) as implemented in
- Reasonably clean, composable, modern codebase with little magic ✨
The hyperparameter optimisation (
- Robust multi-core support (i.e. it can automatically kill timed out external code, even if it ignores
mongodbrequired (important for cough certain computing environments cough)
- Extensions to the
- Possibility to implement multi-step optimisation (experimental at the moment)
- Resumable/recoverable runs backed by a readable, atomically written history of the optimisation (backed by
- Search spaces can be defined entirely in text, i.e. they're easily writeable, portable and serialisable
On the roadmap, coming soon™:
- Thorough caching for computations (everything is prepared!)
- Plugin system (currently, custom objects need to be registered manually)
Frequently Asked Questions
(They are not actually frequently asked.)
I don't work in computational chemsitry/condensed matter physics. Should I care?
The short answer is regrettably probably no.
However, I think the architecture of this library is quite neat, so maybe it can provide some marginally interesting reading. The
tune component is very general and provides, in my opinion, a delightfully clean interface to
engine is also rather general and provides a somewhat nice way to serialise specific kinds of python objects to
Why should I use this?
If you need to use any of the libraries mentioned above it might be more convenient. If you need to do hyperparameter optimisation and are tired of plain
hyperopt it might be useful.
Release history Release notifications
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size & hash||File type||Python version||Upload date|
|cmlkit-2.0.0a16-py3-none-any.whl (193.0 kB) View hashes||Wheel||py3|
|cmlkit-2.0.0a16.tar.gz (62.6 kB) View hashes||Source||None|
Hashes for cmlkit-2.0.0a16-py3-none-any.whl