Skip to main content

Toolkit for using topological data analysis representations.

Project description

moleculetda

A framework to use topological data analysis to extract topological information from a structure (e.g., molecule or crystal), which can then be used in downstream tasks.

Installation

The library can be installed as follows:

pip install moleculetda

Examples

As an example, we will start with the following metal-organic framework (MOF) and construct topological summaries of all the channels and voids in the structure:

Persistence diagrams can be generated from an example structure file such as a .cif file.

from moleculetda.structure_to_vectorization import structure_to_pd
import matplotlib.pyplot as plt
import numpy as np

filename = 'files/mof_structs/str_m4_o1_o1_acs_sym.10.cif'

# return a dict containing persistence diagrams for different dimensions (1d - channels, 2d - voids)
arr_dgms = structure_to_pd(filename, supercell_size=20)

# plot out the 1d and 2d diagrams
dgm_1d = arr_dgms['dim1']
dgm_2d = arr_dgms['dim2']

plot_pds(dgm_1d, dgm_2d)

 ̰

Starting from arr_dgms (dict storing the persistence diagrams), vectorized representations can be generated. Axes units are the same as the units of the original structure file:

# initialize parameters for the "image" representation:
# spread: Gaussian spread of the kernel, pixels: size of representation (n, n),
# weighting_type: how to weigh the persistence diagram points
# Optional: specs can be provided to give bounds on the representation
from moleculetda.vectorize_pds import PersImage, pd_vectorization
from moleculetda.plotting import plot_per_images

pim = PersImage(spread=0.15,
            pixels=[50, 50],
            weighting_type = 'identity')

# get both the 1d and 2d representations
images = []
for dim in [1, 2]:
    dgm = arr_dgms[f"dim{dim}"]
    images.append(pd_vectorization(dgm, spread=0.15, weighting='identity', pixels=[50, 50]))

plot_pers_images(images, arr_dgms)

The resulting 1d and 2d image representations can be used for other tasks.

Citation

Aditi S. Krishnapriyan, Maciej Haranczyk, Dmitriy Morozov. Topological Descriptors Help Predict Guest Adsorption in Nanoporous Materials. J. Phys. Chem. C (2020)

@article{doi:10.1021/acs.jpcc.0c01167,
author = {Krishnapriyan, Aditi S. and Haranczyk, Maciej and Morozov, Dmitriy},
title = {Topological Descriptors Help Predict Guest Adsorption in Nanoporous Materials},
journal = {The Journal of Physical Chemistry C},
volume = {124},
number = {17},
pages = {9360-9368},
year = {2020},
doi = {10.1021/acs.jpcc.0c01167},

}

Aditi S. Krishnapriyan, Joseph Montoya, Maciej Haranczyk, Jens Hummelshoej, Dmitriy Morozov. Machine learning with persistent homology and chemical word embeddings improves predictive accuracy and interpretability in metal--organic frameworks. Scientific Reports (2021)

@article{krishnapriyan_machine_2021,
  title={Machine learning with persistent homology and chemical word embeddings improves prediction accuracy and interpretability in metal-organic frameworks},
  author={Krishnapriyan, Aditi S and Montoya, Joseph and Haranczyk, Maciej and Hummelsh{\o}j, Jens and Morozov, Dmitriy},
  journal = {Scientific Reports},
  volume = {11},
  numer = {1},
  issn = {2045-2322},
  pages = {8888},
  year={2021},
  doi = {10.1038/s41598-021-88027-8}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

moleculetda-0.2.1.tar.gz (30.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

moleculetda-0.2.1-py3-none-any.whl (14.3 kB view details)

Uploaded Python 3

File details

Details for the file moleculetda-0.2.1.tar.gz.

File metadata

  • Download URL: moleculetda-0.2.1.tar.gz
  • Upload date:
  • Size: 30.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for moleculetda-0.2.1.tar.gz
Algorithm Hash digest
SHA256 43e2e2d21abbb1011421e98090013559c31658b350ce281ea874ec2a6c1dc1bb
MD5 3da41244451c855657e507f2f4c962f5
BLAKE2b-256 e40301e9b51a873e281fdfcea5d52c81a2a6d9abe2328965c89e2046e20ea60a

See more details on using hashes here.

File details

Details for the file moleculetda-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: moleculetda-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 14.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for moleculetda-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a40151128066e37516b2cb1f554e11e180dc6707f5037d4aba629837e99d3964
MD5 1984c25918f6c5ba07024e0ec308b5e6
BLAKE2b-256 859b34b29be33bd498ed7fefe1f52e3f49f47be2f53f6d829db4a6fdf56b8bfc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page