gridrdf·PyPI

Grouped representation of interatomic distances

These details have not been verified by PyPI

Project links

Project description

gridrdf

Grouped representation of interatomic distances (GRID)

This package is designed to compute GRID descriptions of crystal structures and use them to train ML models, currently based on properties extracted from the Materials Project. In addition, it contains a number of tools for computing earth mover’s distance (EMD) between distributions such as GRID or RDF, and using the resulting dissimilarities for further calculations.

This code accompanies the following paper. We appreciate if you could cite any use of this code or method.

Grouped Representation of Interatomic Distances as a Similarity Measure for Crystal Structures

Installation

The latest stable version of gridrdf can be installed using pip:

pip install gridrdf

If you are using conda, you may find it easier to create a new environment with the required dependencies first, before installing gridrdf using pip:

conda create -n gridrdf_env -f environment.yml
conda activate gridrdf_env
pip install gridrdf

Alternatively, the most recent development version can be installed by cloning the git repository, and then installing in ‘development’ mode:

git clone https://git.ecdf.ed.ac.uk/funcmatgroup/gridrdf.git
pip install -e gridrdf/

Using conda with this approach, you can install the dependencies from requirements.txt:

git clone https://git.ecdf.ed.ac.uk/funcmatgroup/gridrdf.git
conda env create -n gridrdf_env --file gridrdf/requirements.txt -c defaults -c conda-forge
conda activate gridrdf_env
pip install -e gridrdf

Testing

Once downloaded or installed, it is recommended to test the code operates correctly. Using a python terminal, navigate to the gridrdf directory and type

python -m unittest discover -s tests

Using the Code

All modules contained in gridrdf have documentation describing their intended use, and are grouped into ‘data preparation’ (gridrdf.data_prepare), ‘similarity calculation’ (gridrdf.earth_mover_distance) and ‘model training’ (gridrdf.train) steps. Other utility modules are also included.

Submodules of gridrdf can be imported and used interactively in a python environment, but the main steps outlined above can also be accessed as command line scripts by calling the module directly (–help will give more details of usage):

python -m gridrdf.MODULE_NAME --help

Intended Workflow

To re-create the results presented in the publication of predicting bulk modulus using a kNN model and EMD dissimilarity, the procedure is as follows:

Import data from the materials project with calculated elastic moduli
```
data = gridrdf.data_prepare.get_MP_bulk_modulus_data(APIkey)
with open('MP_modulus.json') as f:
            gridrdf.json.dumps(data, f)
```
NOTE: gridrdf currently relies on the legacy Materials Project API, so needs an old API KEY

Calculate GRID representation for each structure (up to a number of GRID shells) and save to files.

gridrdf.data_prepare.batch_rdf(data,
                                                           num_neighbours=100,
                                                           bin_size = 0.1,
                                                           method='kde',
                                                           output_dir = './GRIDS',
                                                           normalize=True
                                                          )

or from a terminal:

python -m gridrdf.data_prepare --data_source MP_modulus.json --output_dir ../GRIDS/ --tasks grid_rdf_kde

Filter structure with negative bulk moduli

for d in data:
        if d['elasticity.K_VRH'] < 0:
                data.remove(d)

or from a terminal:

python -m gridrdf.data_prepare --data_source MP_modulus.json --output_dir ./GRIDS/ --output_file MP_subset.json --tasks subset_property --prop_filter elasticity.K_VRH 0 np.inf

Filter elements with atomic number > Bi:

# First, generate internal list of 78 elements (as gridrdf.composition.periodic_table_78)
gridrdf.composition.element_indice()
data = gridrdf.data_prepare.elements_selection(data, gridrdf.composition.periodic_table_78, mode='consist')

NOTE: not currently implemented for command line script

Steps 2-4 can be combined into a single function call (similarly through terminal script by specifying tasks in order):

data_quick = gridrdf.data_prepare.main(data_source = './MP_modulus.json',
                                                                  tasks = ['subset_grid_len', 'subset_composition', 'subset_property'],
                                                                  output_dir = './GRIDS',
                                                                  output_file = 'subset.json',
                                                                  max_dist=10,
                                                                  min_grid_groups = 100,
                                                                  composition = {'elem': gridrdf.composition.periodic_table_78, 'type':'consist'},
                                                                  data_property = ('elasticity.K_VRH', 0, np.inf)
                                                                 )

Calculate pair-wise dissimilarity matrix between structures using EMD (time-consuming)

similarity = gridrdf.earth_mover_distance.rdf_similarity_matrix(data, all_GRID, method='emd')
similarity.to_csv('GRID_sim_whole_matrix.csv')

or from a terminal:

python -m gridrdf.earth_mover_distance --input_file MP_modulus.json --rdf_dir ./GRIDS/ --output_file GRID_sim --task rdf_similarity_matrix

Use a simplified kNN model to predict bulk modulus

K_data = np.array([ x['elasticity.K_VRH'] for x in data ])
model = sklearn.neighbors.KNeighborsRegressor(n_neighbors=1, metric='precomputed')
gridrdf.train.calc_obs_vs_pred_2D(funct = model,
                                                                 X_data = similarity,
                                                                 y_data = K_data,
                                                                 test_size = 0.2,
                                                                 outdir= './',
                                                                )

or from a terminal:

python -m gridrdf.train --input_file MP_modulus.json --rdf_dir ./GRIDS/ --input_features distance_matrix --dist_matrix GRID_sim_whole_matrix.csv --out_dir ./ --funct knn_reg --target bulk_modulus --metrics emd --task obs_vs_pred

Issues

If you have any questions, comments or problems with the code, please feel free to post them as issues here

Note

This project has been set up using PyScaffold 4.3.1. For details and usage information on PyScaffold see https://pyscaffold.org/.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.0

Mar 15, 2023

0.2.0

Nov 1, 2022

0.1.3

Jul 1, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gridrdf-0.3.0.tar.gz (1.9 MB view details)

Uploaded Mar 15, 2023 Source

Built Distribution

gridrdf-0.3.0-py3-none-any.whl (55.5 kB view details)

Uploaded Mar 15, 2023 Python 3

File details

Details for the file gridrdf-0.3.0.tar.gz.

File metadata

Download URL: gridrdf-0.3.0.tar.gz
Upload date: Mar 15, 2023
Size: 1.9 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for gridrdf-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`b8cf4849c98522b7a410ebaf2ee995614091e8c3f9706cc8b9aeb6960501877d`
MD5	`abe714b2f54861663aec26cba30c5e8c`
BLAKE2b-256	`d16f371292a8290d54142a319676eb5774d6b1136491550d93cbe3cbd407dc51`

See more details on using hashes here.

File details

Details for the file gridrdf-0.3.0-py3-none-any.whl.

File metadata

Download URL: gridrdf-0.3.0-py3-none-any.whl
Upload date: Mar 15, 2023
Size: 55.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for gridrdf-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`25ba72f70a9cffc6c0715e694934c82ef840803919d6a0ce6680d7c72d38268e`
MD5	`4c45868cf1ea4099f4e732a130acbbb5`
BLAKE2b-256	`88361d12607e9f3992b075b1a04762d42beae3ac76957c423a5d20fa5eb52373`

See more details on using hashes here.

gridrdf 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

gridrdf

Installation

Testing

Using the Code

Intended Workflow

Issues

Note

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes