Predict the mechanical properties of multi-component transition metal carbides (MTMCs).
Project description
Elastic net
Machine learning model for predicting multi-component transition metal carbides (MTMCs)
This is the manual to reproduce results and support conclusions of Lattice Distortion Informed Exceptional Multi-Component Transition Metal Carbides Discovered by Machine Learning.
We recommend using a Linux/Windows operating system to run the following examples, under the current directory.
Table of Contents
Installation
Install under conda environment
- Create a new environment
conda create -n ElasticNet python==3.10
- Activate the environment
conda activate ElasticNet
- Install package
pip install elasticnet
Alternatively, you can install with pip.
- Install the package. Use
--user
option if you don't have the root permission.
pip install elasticnet --user
- If your IP locates in mainland China, you may need to install it from the tsinghua mirror.
pip install elasticnet -i https://pypi.tuna.tsinghua.edu.cn/simple
Requirements file: requirements.txt
Key modules
numpy==1.25.0
scikit-learn==1.2.2
tensorflow==2.10.0
ase==3.22.1
pandas==1.5.3
Example of using the well-trained model
- Download the well-trained parameters: checkpoint
- Run the following python code:
from elasticnet import predict_formula
pf = predict_formula(config='input_config.json',ckpt_file='checkpoint')
pf.predict(*['VNbTa', 'TiNbTa'])
- The mechanical properties of (VNbTa)C3 and (TiNbTa)C3 will show on the screen. The specific modulus of each column is: B, G, E, Hv, C11, C44.
array([[294.43195 , 203.70157 , 496.67032 , 25.989697, 632.3356 ,
175.50716 ],
[283.17245 , 201.96506 , 489.7816 , 26.824062, 607.07336 ,
178.52579 ]], dtype=float32)
Train a new model from scratch
Prepare DFT calculations
- Bulk optimization.
- Elastic constants calculation.
Collect DFT results
- Collect elastic constants into a file with
csv
extension. See example: files/HECC_properties_over_sample.CSV. - You may refer to these papers to calculate modulus from C11, C12, and C44: PHYSICAL REVIEW B 87, 094114 (2013) and Journal of the European Ceramic Society 41 (2021) 6267-6274
- The
*csv
file should contain at least these columns:nominal_formula
,C11
,C12
,C44
,B
,G
,E
,Hv
, andreal_formula
. See example: files/HECC_properties_over_sample.CSV.
Prepare configurations files
-
input_config.json
: defines how to generate input features and labels. You are recommended to download this file and modify then.Variable Type Meaning include_more bool If True
, thebulk_energy_per_formula
andvolume_per_formula
are also be included in the input features.split_test bool If True
, a new test set will be split from the dataset. For cross validation, it is OK to set this asFalse
.clean_by_pearson_r bool Clean input features. Highly correlated features will be removed if this is True
.reduce_dimension_by_pca bool Clean input features by PCA
. Choose one amongclean_by_pearson_r
andreduce_dimension_by_pca
.prop_precursor_path str A file storing the properties of precursory binary carbides. File extension can be *.csv
and*.json
. See example: file/HECC_precursors.csvmodel_save_path str Path for storing PCA
model and other information when generating input features and labelsprops list A list of properties that are encoded into the input features. Choose among the column names of files/HECC_precursors.csv. operators list A list of operators to expand the input dimension. Choose among: ['cube', 'exp_n', 'exp', 'plus', 'minus', 'multiply', 'sqrt', 'log10', 'log', 'square']. HECC_properties_path str A file contains the collected properties of MTMCs. labels list A list of label names that need to fit/learn. soap_features bool Whether to use SOAP descriptor. soap_config dict A python dict that defines the configuration of SOAP descriptor. - input_structure_type: 'POSCAR' or 'CONTCAR'. Use 'POSCAR' or 'CONTCAR' to generate SOAP features. - You can find the explanations for other specifications here: SOAP.__init__
-
train.json
: defines how to train the machine-learning model.Variable Type Meaning Nodes_per_layer list Number of nodes of every hidden layers Number_of_fold int Number of cross-validation folds. Normally 5
or10
.feature_file str A file contains input features. label_file str A file contains labels of samples. Activation_function str Activation function of hidden layers. Alternatives: 'relu', 'softmax', 'sigmoid', 'tanh' Output_activation str Activation function of the output layer. Alternatives: 'relu', 'softmax', 'sigmoid', 'tanh' Number_of_out_node int/'auto' Number of nodes of the output layer. If there is only one column in the label_file
, this variable should be1
. 'auto' is for multiple columns.Optimizer str The name of the optimizer. Examples: tf.keras.optimizers
Cost_function str Name of cost function in Tensorflow
. Examples:tf.keras.losses
Metrics list A list of metrics to evaluate the model. Examples: tf.keras.metrics
Batch_size int The batch size. See tf.keras.Model.fit
Epochs int Number of epochs for training. See tf.keras.Model.fit
Verbose int Verbosity mode. See tf.keras.Model.fit
Regularization bool Whether to used the L2 regularization. See tf.keras.regularizers.L2
.Model_save_path str A folder to store the well-trained NN model. Log_save_path str A folder to store the training log. Prediction_save_path str A folder to store the predictions of input features after training. SEED int Random seed for shuffling input dataset.
Run main function
python -m elasticnet
The following python code will be executed.
def main():
# prepare dataset
from elasticnet.prepare_input import x_main, y_main
x_main('input_config.json', load_PCA=False, save_PCA=True)
y_main('input_config.json')
# train
from elasticnet.ann import CV_ML_RUN, load_and_pred
CV_ML_RUN('train.json')
load_and_pred('train.json', 'x_data_after_pca.txt', write_pred_log=True, drop_cols=None)
main()
You may want to prepare the dataset and train the model in separate steps, see below ↓.
Collect input features and labels
from elasticnet.prepare_input import x_main, y_main
x_main('input_config.json', load_PCA=False, save_PCA=True)
y_main('input_config.json')
Three files will be generated:
x_data_init.txt
: input features withoutPCA
.x_data_after_pca.txt
: input features afterPCA
.y_data.txt
: labels
Train
- Run the following python code.
from elasticnet import CV_ML_RUN, load_and_pred
if __name__ == '__main__':
CV_ML_RUN('train.json')
load_and_pred('train.json', 'x_data_after_pca.txt', write_pred_log=True, drop_cols=None)
- You can also execute
python -m elasticnet
directly in the console. See Run main function.
Check training results
- Generated files/folders
checkpoint
: A folder forPCA
model, NN model, and other information for generating input features.cp.ckpt
: Location of NN model.log
: Learning curves and weights of all CV models.- The file with extension
*.global.acc.loss
summarizes the model performance. Example: 4_layer-80_80_80_80_nodes.global.acc.loss
- The file with extension
pred
: Predictions of input features.- prediction_all.txt: all CV models.
- prediction_mean.txt: average of CV models.
- pca_model.joblib:
PCA
model. - scale_range.json: Range to rescale input features.
- scale_range_1.json: Range to rescale input features again.
Predict
- After training, run the following python code:
from elasticnet import predict_formula
pf = predict_formula(config='input_config.json',ckpt_file='checkpoint')
pf.predict(*['VNbTa', 'TiNbTa'])
- The mechanical properties of (VNbTa)C3 and (TiNbTa)C3 will show on the screen. The specific modulus of each column is: B, G, E, Hv, C11, C44.
array([[294.43195 , 203.70157 , 496.67032 , 25.989697, 632.3356 ,
175.50716 ],
[283.17245 , 201.96506 , 489.7816 , 26.824062, 607.07336 ,
178.52579 ]], dtype=float32)
High-throughput predict
- Run the following python code:
from elasticnet import high_throughput_predict
high_throughput_predict()
- Output: ANN_predictions.xlsx
Ternary plot
- Run the following python code:
from elasticnet import ternary_plot
ternary_plot(elements = ['Ti', 'Nb', 'Ta'])
-
Alternatively,
elements = ['VNbTa', 'Ti', 'Hf']
. -
Output: phase_diagrams/**_diagram.csv
-
Plot.
Other scripts
Get ROM
- Run the following python code:
from elasticnet import get_rom
ROM = get_rom(config='input_config.json', formulas='formulas.txt', props=['B', 'G', 'E', 'Hv', 'VEC'])
print(ROM)
- Output. If the formulas.txt contains ['VNbTa', 'TiNbTa'] only.
array([[310.33922223, 210.80075867, 515.61666613, 26.20022487,
9. ],
[291.74733333, 199.9075404 , 488.11937417, 25.52194014,
8.66666667]])
Get VEC
- VEC is simply the last column of Get ROM.
Abbreviations
Abbr. | Full name |
---|---|
MTMC | Multi-component transition metal carbides |
HECC | High-entropy carbide ceramic |
HEC | High-entropy ceramic |
ML | Machine learning |
SOAP | Smooth overlap of atomic positions |
NN | Neural networks |
CV | cross validation |
ROM | Rule of mixtures |
VEC | Valence electron concentration |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file elasticnet-1.0.3.tar.gz
.
File metadata
- Download URL: elasticnet-1.0.3.tar.gz
- Upload date:
- Size: 22.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5aa2bf16cadffb7ee13b8422c8dd1c4f7a5d80673668b48ad571711e0bc8a9b0 |
|
MD5 | 8639881914ae70569ba9588e3979ba37 |
|
BLAKE2b-256 | 2b6f188bf62a4311dd6bee21108124402d15726faf3297dfc1408183365e597b |
File details
Details for the file elasticnet-1.0.3-py3-none-any.whl
.
File metadata
- Download URL: elasticnet-1.0.3-py3-none-any.whl
- Upload date:
- Size: 19.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fc6a4b2b652eeeb4e2ade6a1c60a5393d4b65221752b76f051069ed7431fceba |
|
MD5 | 1fa8dc5d0e7db178dcdf2a3cbdde7771 |
|
BLAKE2b-256 | 136f3995e2385e360059b8dfd3ce748007dc6d36ca3e6410ceabd546614a3878 |