Machine learning tools for computational chemistry and condensed matter physics
Project description
cmlkit
"a kit for camels"
🐫🧰🐫🧰🐫🧰🐫🧰🐫🧰🐫🧰🐫🧰🐫🧰🐫🧰🐫🧰🐫🧰🐫🧰🐫🧰🐫🧰🐫🧰
cmlkit
provides a clean and concise way to specify, tune, and evaluate machine learning models for computational chemistry and condensed matter physics, particularly for atomistic predictions.
WARNINGS:
cmlkit
depends onqmmlpack
, which is not yet publicly available.- This is a "scientific code", i.e. development occurs infrequently and somewhat haphazardly. I'll try to not make breaking changes too often, and never in minor versions.
- This is very domain-specific project, so it is somewhat full of jargon. The
tune
andengine
sub-modules are quite general, though!
If you use this code in any scientific work, please mention it in the publication and let me know. Thanks! 🐫
What is cmlkit
? 🐫🧰
At its core, cmlkit
defines a unified dict
-based format to specify model components, which can be straightforwardly read and written as yaml
. It provides interfaces to implementations of popular methods in its domain using this format. Model components are implemented as pure-ish functions, which is conceptually satisfying and opens the door to easy pipelining and caching.
On this basis, it then implements parallel hyperparameter optimisation (using hyperopt
as backend), and provides tools to train models, make predictions, and evaluate those predictions. It is intended to be extensible and flexible enough for the demands of research. It is also "high-performance computing compatible", i.e. it can run in computing environments straight from the 90s. 🤓
Out of necessity, it also implements yet another dataset format, but makes up for it by providing automatic loading, which is neat.
Compatibility
At the moment, there are interfaces for:
Representations:
- Many-Body Tensor Representation (MBTR) (Huo, Rupp, arXiv 1704.06439 (2017)) (
qmmlpack
interface) - Smooth Overlap of Atomic Positions (SOAP) representaton (Bartok, Kondor, Csanyi, PRB 87, 184115 (2013)) (
quippy
interface) - Symmetry Functions (SF) representation (Behler, JCP 134, 074106 (2011)) (
RuNNer
interface)
Regression methods:
- Kernel Ridge Regression (KRR) as implemented in
qmmlpack
Features
- Reasonably clean, composable, modern codebase with little magic ✨
The hyperparameter optimisation (cmlkit.tune
) boasts:
- Robust multi-core support (i.e. it can automatically kill timed out external code, even if it ignores
SIGTERM
) - No
mongodb
required (important for cough certain computing environments cough) - Extensions to the
hyperopt
spaces (log
grids) - Possibility to implement multi-step optimisation (experimental at the moment)
- Resumable/recoverable runs backed by a readable, atomically written history of the optimisation (backed by
son
) - Search spaces can be defined entirely in text, i.e. they're easily writeable, portable and serialisable
On the roadmap, coming soon™:
- Thorough caching for computations (everything is prepared!)
- Plugin system (currently, custom objects need to be registered manually)
Frequently Asked Questions
(They are not actually frequently asked.)
I don't work in computational chemsitry/condensed matter physics. Should I care?
The short answer is regrettably probably no.
However, I think the architecture of this library is quite neat, so maybe it can provide some marginally interesting reading. The tune
component is very general and provides, in my opinion, a delightfully clean interface to hyperopt
. The engine
is also rather general and provides a somewhat nice way to serialise specific kinds of python objects to yaml
.
Why should I use this?
If you need to use any of the libraries mentioned above it might be more convenient. If you need to do hyperparameter optimisation and are tired of plain hyperopt
it might be useful.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for cmlkit-2.0.0a18-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4e32d81798b59719725366fd487f8a7729f4de7235241895f02289f621c0eca7 |
|
MD5 | b1e24ad633a044cfeba5ea97b8685d1d |
|
BLAKE2b-256 | 6daf68b75426a7bf49f54f0f4048a4c2748b3fbfd86d027c917cc1c4dfa1def4 |