Skip to main content

A Python package providing a workflow for descriptor embedding and clustering for atomistic environments.

Project description

Descriptor Embedding and Clustering for Atomisitic-environment Framework (DECAF)

https://gitlab.mpcdf.mpg.de/klai/decaf.git

Tutorials

For tutorials with examples, please visit the GitLab Pages https://klai.pages.mpcdf.de/decaf/

Description

This is a Python package which provide a work flow to obtain clustering of local environments in dataset of structures.
Please refer the methodology paper "A Fuzzy Classification Framework to Identify Equivalent Atoms in Complex Materials and Molecules"[1] for details.
It provides mainly the following functions:

  1. Computating SOAP descriptor from an input atomic structure as an ASE Atoms object.
  2. Applying classical multidimensional scaling (MDS) on a dataset of SOAP.
  3. Differnetiating atomic environments of the embeded dataset using mean shift clustering (MSC).
  4. Embedding and classifying environments outside of MDS-MSC dataset.

Optional functions are also provided:

  1. Applying kernel principal component analysis (kPCA) / principal component analysis (PCA) / Sketch-Map[2] for embedding
  2. Applying HDBSCAN[3] for clustering.

References linking the journal article[1] and the code

Here we provide the locations in the code implementing the corresponding methods in the article.[1]
For details about how to use each function, please refer to decaf/examples/sample_code.ipynb or comments in decaf/src/decaf.py.

Methodology involved in the main text[1]:

double-SOAP (Sec.2A[1]):
decaf/src/decaf.py : function get_SOAP

classical MDS (Sec.2B[1]):
decaf/src/decaf.py : function get_cMDS

embedding any SOAP vector with obtained model
decaf/src/decaf.py : function embed_cMDS

MSC (Sec.2C[1]):
decaf/src/decaf.py : function get_MeanShift

Demonstrations in the main text[1]:

PAH examples (Sec.3A[1]):
decaf/examples/sample_code.ipynb : block PAH Example

Pd Surfaces examples (Sec.3B[1]):
decaf/examples/sample_code.ipynb : block Pd Surfaces Demonstration

Out-of-sample classification of Pd nanoparticle (Sec.3C[1]):
decaf/examples/sample_code.ipynb : block Classification Demonstration

Demonstrations in the supplementary information (SI)[1]:

kPCA (Sec.S1A[1]):
decaf/examples/sample_code.ipynb : block kPCA Embedding

SketchMap (Sec.S1B[1]):
decaf/examples/sample_code.ipynb : block Sketch Map Embedding

HDBSCAN (Sec.S2A[1]):
decaf/examples/sample_code.ipynb : block HDBSCAN Clustering

Demonstration in Sec.S3-4 are reproducible with change in (hyper)parameters according to the SI with functions in:
decaf/examples/sample_code.ipynb : block Pd Surfaces Demonstration

Demonstration in Sec.S5: MD settings and analysis are given in main text and reproducible, thus omitted in the example here.

Installation

You can install the package from PyPI with the following command

pip install fhi-decaf

For development from a local clone, use an editable install

pip install -e ".[test,docs]"

Then import the package with the following in Python

import decaf

Dependence:
Numpy, ASE, DScribe, Scikit Learn, Scipy

Repository Structure:

decaf
├── examples                            # Folder containing examples of applying DECAF   ├── Compiled_SketchMap              #     Folder containing compiled SketchMap if needed   ├── sample_code.ipynb               #     Sample code of DECAF applied on the demonstration cases   └── Structures                      #     Folder containing atomic structures for the demonstration cases       └── **.con
├── pyproject.toml                      # Setup code for installing DECAF
├── README.md                           # The readme you are reading now.
└── src                                 # Folder containing Source code of DECAF
    └── decaf.py                        #     Source code of DECAF

Reference

  1. K. C. Lai, S. Matera, C. Scheurer, K. Reuter, "A Fuzzy Classification Framework to Identify Equivalent Atoms in Complex Materials and Molecules" J. Chem. Phys 159.2 (2023). DOI: 10.1063/5.0160369 .
  2. M. Ceriotti, G. A. Tribello, and M. Parrinello, “Simplifying the representation of complex free-energy landscapes using sketch-map,” Proc. Natl. Acad. Sci. U.S.A. 108, 13023–13028 (2011).
  3. L. McInnes, J. Healy, and S. Astels, “hdbscan: Hierarchical density based clustering.” J. Open Source Softw. 2, 205 (2017).

Authors and Affiliation

Authors:
King Chun Lai, Sebastian Matera, Christoph Scheurer, Karsten Reuter

Affiliation:
Fritz-Haber-Institut der Max-Planck-Gesellschaft, Faradayweg 4-6, 14195 Berlin, Germany

Support

King Chun Lai : lai@fhi-berlin.mpg.de

License

Descriptor Embedding and Clustering for Atomisitic-environment Framework by King Chun Lai, Sebastian Matera, Christoph Scheurer, Karsten Reuter is licensed under CC BY 4.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fhi_decaf-1.0.6.tar.gz (5.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fhi_decaf-1.0.6-py3-none-any.whl (46.2 kB view details)

Uploaded Python 3

File details

Details for the file fhi_decaf-1.0.6.tar.gz.

File metadata

  • Download URL: fhi_decaf-1.0.6.tar.gz
  • Upload date:
  • Size: 5.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for fhi_decaf-1.0.6.tar.gz
Algorithm Hash digest
SHA256 1fef3fe1256a292ac6b514e6e4c360cced57c1c6af859ec09d1d6e191bb7630d
MD5 d6f6bc50ae9285e42496d93f0a4ffcde
BLAKE2b-256 6835b54e82243e08b53949a47891e42a3c2bda48f3734c12f9103659924c346f

See more details on using hashes here.

File details

Details for the file fhi_decaf-1.0.6-py3-none-any.whl.

File metadata

  • Download URL: fhi_decaf-1.0.6-py3-none-any.whl
  • Upload date:
  • Size: 46.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for fhi_decaf-1.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 1663eecc8cafe5bd4c6b75b3112fdaba7a6ea958752da94d11760fcf5d7b8777
MD5 bcd431f9f96634614c7de6676cecb037
BLAKE2b-256 fe6a9021fb53b536cba4cfdcb5339f176f8afb605469626a7f182213877e7c8c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page