Toolkit for using topological data analysis representations.
Project description
moleculetda
A framework to use topological data analysis to extract topological information from a structure (e.g., molecule or crystal), which can then be used in downstream tasks.
Installation
The library can be installed as follows:
pip install moleculetda
Examples
As an example, we will start with the following metal-organic framework (MOF) and construct topological summaries of all the channels and voids in the structure:
Persistence diagrams can be generated from an example structure file such as a .cif file.
from moleculetda.structure_to_vectorization import structure_to_pd
import matplotlib.pyplot as plt
import numpy as np
filename = 'files/mof_structs/str_m4_o1_o1_acs_sym.10.cif'
# return a dict containing persistence diagrams for different dimensions (1d - channels, 2d - voids)
arr_dgms = structure_to_pd(filename, supercell_size=20)
# plot out the 1d and 2d diagrams
dgm_1d = arr_dgms['dim1']
dgm_2d = arr_dgms['dim2']
plot_pds(dgm_1d, dgm_2d)
̰
Starting from arr_dgms (dict storing the persistence diagrams), vectorized representations
can be generated. Axes units are the same as the units of the original structure file:
# initialize parameters for the "image" representation:
# spread: Gaussian spread of the kernel, pixels: size of representation (n, n),
# weighting_type: how to weigh the persistence diagram points
# Optional: specs can be provided to give bounds on the representation
from moleculetda.vectorize_pds import PersImage, pd_vectorization
from moleculetda.plotting import plot_per_images
pim = PersImage(spread=0.15,
pixels=[50, 50],
weighting_type = 'identity')
# get both the 1d and 2d representations
images = []
for dim in [1, 2]:
dgm = arr_dgms[f"dim{dim}"]
images.append(pd_vectorization(dgm, spread=0.15, weighting='identity', pixels=[50, 50]))
plot_pers_images(images, arr_dgms)
The resulting 1d and 2d image representations can be used for other tasks.
Citation
@article{doi:10.1021/acs.jpcc.0c01167,
author = {Krishnapriyan, Aditi S. and Haranczyk, Maciej and Morozov, Dmitriy},
title = {Topological Descriptors Help Predict Guest Adsorption in Nanoporous Materials},
journal = {The Journal of Physical Chemistry C},
volume = {124},
number = {17},
pages = {9360-9368},
year = {2020},
doi = {10.1021/acs.jpcc.0c01167},
}
@article{krishnapriyan_machine_2021,
title={Machine learning with persistent homology and chemical word embeddings improves prediction accuracy and interpretability in metal-organic frameworks},
author={Krishnapriyan, Aditi S and Montoya, Joseph and Haranczyk, Maciej and Hummelsh{\o}j, Jens and Morozov, Dmitriy},
journal = {Scientific Reports},
volume = {11},
numer = {1},
issn = {2045-2322},
pages = {8888},
year={2021},
doi = {10.1038/s41598-021-88027-8}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file moleculetda-0.2.1.tar.gz.
File metadata
- Download URL: moleculetda-0.2.1.tar.gz
- Upload date:
- Size: 30.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
43e2e2d21abbb1011421e98090013559c31658b350ce281ea874ec2a6c1dc1bb
|
|
| MD5 |
3da41244451c855657e507f2f4c962f5
|
|
| BLAKE2b-256 |
e40301e9b51a873e281fdfcea5d52c81a2a6d9abe2328965c89e2046e20ea60a
|
File details
Details for the file moleculetda-0.2.1-py3-none-any.whl.
File metadata
- Download URL: moleculetda-0.2.1-py3-none-any.whl
- Upload date:
- Size: 14.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a40151128066e37516b2cb1f554e11e180dc6707f5037d4aba629837e99d3964
|
|
| MD5 |
1984c25918f6c5ba07024e0ec308b5e6
|
|
| BLAKE2b-256 |
859b34b29be33bd498ed7fefe1f52e3f49f47be2f53f6d829db4a6fdf56b8bfc
|