Data-driven materials discovery based on composition.
Project description
DiSCoVeR
A materials discovery algorithm geared towards exploring high performance candidates in new chemical spaces using composition-only.
Citing
The preprint is hosted on ChemRxiv:
Baird S, Diep T, Sparks T. DiSCoVeR: a Materials Discovery Screening Tool for High Performance, Unique Chemical Compositions. ChemRxiv 2021. doi:10.33774/chemrxiv-2021-5l2f8-v2. This content is a preprint and has not been peer-reviewed.
The BibTeX citation is as follows:
@article{baird_diep_sparks_2021,
place={Cambridge},
title={DiSCoVeR: a Materials Discovery Screening Tool for High Performance, Unique Chemical Compositions},
DOI={10.33774/chemrxiv-2021-5l2f8-v2},
journal={ChemRxiv},
publisher={Cambridge Open Engage},
author={Baird, Sterling and Diep, Tran and Sparks, Taylor},
year={2021}
}
Installation
I recommend that you run mat_discover
in a separate conda environment. After installing Anaconda or Miniconda, you can create a new environment via:
conda create --name mat_discover
There are three ways to install mat_discover
: Anaconda (conda
), PyPI (pip
), and from source.
Anaconda
The Anaconda mat_discover
package is hosted on the @sgbaird channel and can be installed via:
conda install -c sgbaird mat_discover
Pip
You need to update pip, install the appropriate version of PyTorch, and then install mat_discover
.
Update pip
pip install -U pip
Install PyTorch
Due to limitations of PyPI distributions of CUDA/PyTorch, you will need to install PyTorch separately via the command that's most relevant to you (PyTorch Getting Started). For example, for Stable/Windows/Conda/Python/CUDA-11.1:
conda install pytorch cudatoolkit=11.1 -c pytorch -c conda-forge
Install mat_discover
pip install mat_discover
From Source
conda install torch cudatoolkit=11.1 -c pytorch -c conda-forge # or use pip command specific to you from https://pytorch.org/get-started/locally/
git clone --recurse-submodules https://github.com/sparks-baird/mat_discover.git
cd mat_discover
pip install -e . # or `conda env create --file environment.yml` or `flit install` (after installing `flit` via e.g. `conda install flit`)
Basic Usage
Fit/Predict
from mat_discover.mat_discover_ import Discover
disc = Discover()
disc.fit(train_df) # DataFrames should have at minimum "formula" and "target" columns
scores = disc.predict(val_df)
disc.plot()
disc.save()
print(disc.dens_score_df.head(10), disc.peak_score_df.head(10))
See mat_discover_example.py, , or
Load Data
If you're using your own dataset, you will need to supply a Pandas DataFrame that contains formula
and target
columns. If you have a train.csv
file (located in current working directory) with these two columns, this can be converted to a DataFrame via:
import pandas as pd
df = pd.read_csv("train.csv")
Note that you can load any of the datasets within CrabNet/data/
, which includes matbench
data, other datasets from the CrabNet paper, and a recent (as of Oct 2021) snapshot of K_VRH
bulk modulus data from Materials Project. For example, to load the bulk modulus snapshot:
from mat_discover.CrabNet.data.materials_data import elasticity
train_df, val_df = disc.data(elasticity, "train.csv") # note that `val.csv` within `elasticity` is every other Materials Project compound (i.e. "target" column filled with zeros)
The built-in data directories are as follows:
{'benchmark_data', 'benchmark_data.CritExam__Ed', 'benchmark_data.CritExam__Ef', 'benchmark_data.OQMD_Bandgap', 'benchmark_data.OQMD_Energy_per_atom', 'benchmark_data.OQMD_Formation_Enthalpy', 'benchmark_data.OQMD_Volume_per_atom', 'benchmark_data.aflow__Egap', 'benchmark_data.aflow__ael_bulk_modulus_vrh', 'benchmark_data.aflow__ael_debye_temperature', 'benchmark_data.aflow__ael_shear_modulus_vrh', 'benchmark_data.aflow__agl_thermal_conductivity_300K', 'benchmark_data.aflow__agl_thermal_expansion_300K', 'benchmark_data.aflow__energy_atom', 'benchmark_data.mp_bulk_modulus', 'benchmark_data.mp_e_hull', 'benchmark_data.mp_elastic_anisotropy', 'benchmark_data.mp_mu_b', 'benchmark_data.mp_shear_modulus', 'element_properties', 'matbench', 'materials_data', 'materials_data.elasticity', 'materials_data.example_materials_property'}
To see what .csv
files are available (e.g. train.csv
), you will probably need to navigate to CrabNet/data/ and explore.
Finally, to download data from Materials Project directly, see generate_elasticity_data.py.
Developing
This project was developed primarily in "Python in Visual Studio Code" using black
, mypy
, pydocstyle
, kite
, other tools, and various community extensions. Some other notable tools used in this project are:
- Miniconda
pipreqs
was used as a starting point forrequirements.txt
flit
is used to createpyproject.toml
and publish to PyPIconda env export --from-history -f environment.yml
was used as a starting point forenvironment.yml
grayskull
is used to generatemeta.yaml
for publishing toconda-forge
conda-smithy
is used to create a feedstock forconda-forge
- A variety of GitHub actions are used (see workflows)
pytest
is used for testingnumba
is used to accelerate the Wasserstein distance matrix computations via CPU or GPU
Note that when using a conda
environment (recommended), you may avoid certain issues down the road by opening VS Code via an Anaconda command prompt and entering the command code
(at least until the VS Code devs fix some of the issues associated with opening it "normally"). For example, in Windows, press the "Windows" key, type "anaconda", and open "Anaconda Powershell Prompt (miniconda3)" or similar. Then type code
and press enter.
Bugs, Questions, and Suggestions
If you find a bug or have suggestions for documentation please open an issue. If you're reporting a bug, please include a simplified reproducer. If you have questions, have feature suggestions/requests, or are interested in extending/improving mat_discover
and would like to discuss, please use the Discussions tab and use the appropriate category ("Ideas", "Q&A", etc.). Pull requests are welcome and encouraged.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for mat_discover-1.2.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b99556021fcf083ab4e366e21a8dfa1cd87d82878cabab7c6b4ae3245ab9696d |
|
MD5 | 9371da25ef20750000489087ca5d33e7 |
|
BLAKE2b-256 | c2aa0880a2971cf56e537dacc29bf4b19a66e6fcbf35a68f2ec4bd3d59fe21df |