Skip to main content

Machine learning tools for computational chemistry and condensed matter physics

Project description

cmlkit

"a kit for camels"

🐫🧰🐫🧰🐫🧰🐫🧰🐫🧰🐫🧰🐫🧰🐫🧰🐫🧰🐫🧰🐫🧰🐫🧰🐫🧰🐫🧰🐫🧰

PyPI - Python Version PyPI Code style: black

cmlkit provides a clean and concise way to specify, tune, and evaluate machine learning models for computational chemistry and condensed matter physics, particularly for atomistic predictions.

WARNINGS:

  • cmlkit depends on qmmlpack, which is not yet publicly available.
  • This is a "scientific code", i.e. development occurs infrequently and somewhat haphazardly. I'll try to not make breaking changes too often, and never in minor versions.
  • This is very domain-specific project, so it is somewhat full of jargon. The tune and engine sub-modules are quite general, though!

If you use this code in any scientific work, please mention it in the publication and let me know. Thanks! 🐫

What is cmlkit? 🐫🧰

At its core, cmlkit defines a unified dict-based format to specify model components, which can be straightforwardly read and written as yaml. It provides interfaces to implementations of popular methods in its domain using this format. Model components are implemented as pure-ish functions, which is conceptually satisfying and opens the door to easy pipelining and caching.

On this basis, it then implements parallel hyperparameter optimisation (using hyperopt as backend), and provides tools to train models, make predictions, and evaluate those predictions. It is intended to be extensible and flexible enough for the demands of research. It is also "high-performance computing compatible", i.e. it can run in computing environments straight from the 90s. πŸ€“

Out of necessity, it also implements yet another dataset format, but makes up for it by providing automatic loading, which is neat.

Compatibility

At the moment, there are interfaces for:

Representations:

  • Many-Body Tensor Representation (MBTR) (Huo, Rupp, arXiv 1704.06439 (2017)) (qmmlpack interface)
  • Smooth Overlap of Atomic Positions (SOAP) representaton (Bartok, Kondor, Csanyi, PRB 87, 184115 (2013)) (quippy interface)
  • Symmetry Functions (SF) representation (Behler, JCP 134, 074106 (2011)) (RuNNer interface)

Regression methods:

  • Kernel Ridge Regression (KRR) as implemented in qmmlpack

Features

  • Reasonably clean, composable, modern codebase with little magic ✨

The hyperparameter optimisation (cmlkit.tune) boasts:

  • Robust multi-core support (i.e. it can automatically kill timed out external code, even if it ignores SIGTERM)
  • No mongodb required (important for cough certain computing environments cough)
  • Extensions to the hyperopt spaces (log grids)
  • Possibility to implement multi-step optimisation (experimental at the moment)
  • Resumable/recoverable runs backed by a readable, atomically written history of the optimisation (backed by son)
  • Search spaces can be defined entirely in text, i.e. they're easily writeable, portable and serialisable

On the roadmap, coming soonβ„’:

  • Thorough caching for computations (everything is prepared!)
  • Plugin system (currently, custom objects need to be registered manually)

Frequently Asked Questions

(They are not actually frequently asked.)

I don't work in computational chemsitry/condensed matter physics. Should I care?

The short answer is regrettably probably no.

However, I think the architecture of this library is quite neat, so maybe it can provide some marginally interesting reading. The tune component is very general and provides, in my opinion, a delightfully clean interface to hyperopt. The engine is also rather general and provides a somewhat nice way to serialise specific kinds of python objects to yaml.

Why should I use this?

If you need to use any of the libraries mentioned above it might be more convenient. If you need to do hyperparameter optimisation and are tired of plain hyperopt it might be useful.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for cmlkit, version 2.0.0a16
Filename, size & hash File type Python version Upload date
cmlkit-2.0.0a16-py3-none-any.whl (193.0 kB) View hashes Wheel py3
cmlkit-2.0.0a16.tar.gz (62.6 kB) View hashes Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page