Grouped Representations of Interatomic Distances (GRID)
Project description
Grouped Representation of Interatomic Distances (GRID)
This package is designed to compute GRID descriptions of crystal structures and use them to train ML models, currently based on properties extracted from the Materials Project. In addition, it contains a number of tools for computing earth mover's distance (EMD) between distributions such as GRID or RDF, and using the resulting dissimilarities for further calculations.
This code accompanies the following paper, which should be cited if you use it for any future publications:
Grouped Representation of Interatomic Distances as a Similarity Measure for Crystal Structures
Installation
The latest stable version of gridrdf can be installed using pip:
pip install gridrdf
If you are using conda, you may find it easier to create a new environment with the required dependencies first, before installing gridrdf using pip:
conda create -n gridrdf_env python=3 numpy pandas scikit-learn pymatgen scipy pyemd matminer -c defaults -c conda-forge
conda activate gridrdf_env
pip install gridrdf
Alternatively, the most recent development version can be installed by cloning the git repository, and then installing in 'development' mode:
git clone https://git.ecdf.ed.ac.uk/funcmatgroup/gridrdf.git
pip install -e gridrdf
Using conda with this approach, you can install the dependencies from requirements.txt:
git clone https://git.ecdf.ed.ac.uk/funcmatgroup/gridrdf.git
conda env create -n gridrdf_env --file gridrdf/requirements.txt -c default -c conda-forge
conda activate gridrdf_env
pip install -e gridrdf
Testing
Once downloaded or installed, it is recommended to test the code operates
correctly. Using a python terminal, navigate to the gridrdf
directory and type
python -m unittest discover -s tests
Using the Code
All modules contained in gridrdf have documentation describing their
intended use, and are grouped into 'data preparation' (gridrdf.data_prepare
),
'similarity calculation' (gridrdf.earth_mover_distance
) and 'model training' (gridrdf.train
) steps.
Other utility modules are also included.
Submodules of gridrdf can be imported and used interactively in a python environment, but the main steps outlined above can also be accessed as command line scripts by calling the module directly (--help will give more details of usage):
python -m gridrdf.MODULE_NAME --help
Intended Workflow
To re-create the results presented in the publication of predicting bulk modulus using a kNN model and EMD dissimilarity, the procedure is as follows:
-
Import data from the materials project with calculated elastic moduli
data = get_MP_bulk_modulus_data(APIkey) with open('MP_modulus.json') as f: gridrdf.json.dumps(data, f)
-
Calculate GRID representation for each structure (generates GRID file for each structure)
gridrdf.data_prepare.batch_rdf(data[:2], max_dist=10, bin_size = 0.1, method='kde', output_dir = './GRIDS', normalize=True )
or from a terminal:
python -m gridrdf.data_prepare --data_source MP_modulus.json --output_dir ../GRIDS/ --tasks grid_rdf_kde
-
Remove any structures with fewer than 100 GRID shells
all_GRID = gridrdf.dataio.rdf_read_parallel(data, rdf_dir = './GRIDS/') for i, d in enumerate(data[:]): if len(all_GRID[i]) < 100: data.remove(d) with open('MP_subset.json', 'w') as f: json.dump(data, f, indent=1)
or from a terminal:
python -m gridrdf.data_prepare --data_source MP_modulus.json --output_dir ./GRIDS/ --tasks subset_grid_len --output_file MP_subset.json
-
Filter structure with negative bulk moduli
for d in data: if d['elasticity.K_VRH'] < 0: data.remove(d)
or from a terminal:
python -m gridrdf.data_prepare --data_source MP_modulus.json --output_dir ./GRIDS/ --output_file MP_subset.json --tasks subset_property --prop_filter elasticity.K_VRH 0 np.inf
-
Filter elements with atomic number > Bi:
# First, generate internal list of 78 elements (as gridrdf.composition.periodic_table_78) gridrdf.composition.element_indice() data = gridrdf.data_prepare.elements_selection(data, gridrdf.composition.periodic_table_78, mode='consist')
NOTE: not currently implemented for command line script
Steps 2-5 can be combined into a single function call (similarly through terminal script by specifying tasks in order):
data_quick = gridrdf.data_prepare.main(data_source = './MP_modulus.json',
tasks = ['subset_grid_len', 'subset_composition', 'subset_property'],
output_dir = './GRIDS',
output_file = 'subset.json',
max_dist=10,
min_grid_groups = 100,
composition = {'elem': gridrdf.composition.periodic_table_78, 'type':'consist'},
data_property = ('elasticity.K_VRH', 0, np.inf)
)
- Calculate pair-wise dissimilarity matrix between structures using EMD (time-consuming)
or from a terminal:similarity = gridrdf.earth_mover_distance.rdf_similarity_matrix(data, all_GRID, method='emd') similarity.to_csv('GRID_sim_whole_matrix.csv')
Note: The data can also be processed in smaller chunks usingpython -m gridrdf.earth_mover_distance --input_file MP_modulus.json --rdf_dir ./GRIDS/ --output_file GRID_sim --task rdf_similarity_matrix
indice
(or--data_indice
as a script) to allow parallel-processing. - Use a simplified kNN model to predict bulk modulus
or from a terminal:K_data = np.array([ x['elasticity.K_VRH'] for x in data ]) model = sklearn.neighbors.KNeighborsRegressor(n_neighbors=1, metric='precomputed') gridrdf.train.calc_obs_vs_pred_2D(funct = model, X_data = similarity, y_data = K_data, test_size = 0.2, outdir= './', )
python -m gridrdf.train --input_file MP_modulus.json --rdf_dir ./GRIDS/ --input_features distance_matrix --dist_matrix GRID_sim_whole_matrix.csv --out_dir ./ --funct knn_reg --target bulk_modulus --metrics emd --task obs_vs_pred
Issues
If you have any questions, comments or problems with the code, please feel free to post them as issues here!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file gridrdf-0.1.3.tar.gz
.
File metadata
- Download URL: gridrdf-0.1.3.tar.gz
- Upload date:
- Size: 1.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.28.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0ad90c8f5f7c2e7517c299675ecfa4f8512cc82f9f2e4098ce667cd951f494dc |
|
MD5 | e13a2e01d859bc0c6918d19a996c1ff4 |
|
BLAKE2b-256 | c739f0ac9e779e7bc3a554ce8cdbef17563b6c772692bace1ff6e550bf69cceb |
File details
Details for the file gridrdf-0.1.3-py2.py3-none-any.whl
.
File metadata
- Download URL: gridrdf-0.1.3-py2.py3-none-any.whl
- Upload date:
- Size: 43.7 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.28.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | da858ce3234d502e94b2bbe9c1a5d9d6ff17ed7cea397b759d35d92be456b41b |
|
MD5 | c6656201e9ec42867fd806fde1cb42b1 |
|
BLAKE2b-256 | e7ccb4bfd63b0ff010c58c56e3131942ace386ed8f564916b8c66c2d5a968754 |