Skip to main content

Predict the mechanical properties of multi-component transition metal carbides (MTMCs).

Project description

PyPI - Downloads DOI GitHub PyPI - Wheel GitHub tag (with filter)

Elastic net

Machine learning model for predicting multi-component transition metal carbides (MTMCs)

This is the manual to reproduce results and support conclusions of Lattice Distortion Informed Exceptional Multi-Component Transition Metal Carbides Discovered by Machine Learning.

We recommend using a Linux/Windows operating system to run the following examples, under the current directory.

ML-workflow

Table of Contents

Installation

Install under conda environment

  • Create a new environment
conda create -n ElasticNet python==3.10
  • Activate the environment
conda activate ElasticNet
  • Install package
pip install elasticnet

Alternatively, you can install with pip.

  • Install the package. Use --user option if you don't have the root permission.
pip install elasticnet --user
  • If your IP locates in mainland China, you may need to install it from the tsinghua mirror.
pip install elasticnet -i https://pypi.tuna.tsinghua.edu.cn/simple

Requirements file: requirements.txt

Key modules

numpy==1.25.0    
scikit-learn==1.2.2   
tensorflow==2.10.0   
ase==3.22.1  
pandas==1.5.3

Example of using the well-trained model

  • Download the well-trained parameters: checkpoint
  • Run the following python code:
from elasticnet import predict_formula  
pf = predict_formula(config='input_config.json',ckpt_file='checkpoint')  
pf.predict(*['VNbTa', 'TiNbTa'])  
  • The mechanical properties of (VNbTa)C3 and (TiNbTa)C3 will show on the screen. The specific modulus of each column is: B, G, E, Hv, C11, C44.
array([[294.43195 , 203.70157 , 496.67032 ,  25.989697, 632.3356  ,
        175.50716 ],
       [283.17245 , 201.96506 , 489.7816  ,  26.824062, 607.07336 ,
        178.52579 ]], dtype=float32)

Train a new model from scratch

Prepare DFT calculations

  • Bulk optimization.
  • Elastic constants calculation.

Collect DFT results

Prepare configurations files

  • input_config.json: defines how to generate input features and labels. You are recommended to download this file and modify then.

    Variable Type Meaning
    include_more bool If True, the bulk_energy_per_formula and volume_per_formula are also be included in the input features.
    split_test bool If True, a new test set will be split from the dataset. For cross validation, it is OK to set this as False.
    clean_by_pearson_r bool Clean input features. Highly correlated features will be removed if this is True.
    reduce_dimension_by_pca bool Clean input features by PCA. Choose one among clean_by_pearson_r and reduce_dimension_by_pca.
    prop_precursor_path str A file storing the properties of precursory binary carbides. File extension can be *.csv and *.json. See example: file/HECC_precursors.csv
    model_save_path str Path for storing PCA model and other information when generating input features and labels
    props list A list of properties that are encoded into the input features. Choose among the column names of files/HECC_precursors.csv.
    operators list A list of operators to expand the input dimension. Choose among: ['cube', 'exp_n', 'exp', 'plus', 'minus', 'multiply', 'sqrt', 'log10', 'log', 'square'].
    HECC_properties_path str A file contains the collected properties of MTMCs.
    labels list A list of label names that need to fit/learn.
    soap_features bool Whether to use SOAP descriptor.
    soap_config dict A python dict that defines the configuration of SOAP descriptor. - input_structure_type: 'POSCAR' or 'CONTCAR'. Use 'POSCAR' or 'CONTCAR' to generate SOAP features. - You can find the explanations for other specifications here: SOAP.__init__
  • train.json: defines how to train the machine-learning model.

    Variable Type Meaning
    Nodes_per_layer list Number of nodes of every hidden layers
    Number_of_fold int Number of cross-validation folds. Normally 5 or 10.
    feature_file str A file contains input features.
    label_file str A file contains labels of samples.
    Activation_function str Activation function of hidden layers. Alternatives: 'relu', 'softmax', 'sigmoid', 'tanh'
    Output_activation str Activation function of the output layer. Alternatives: 'relu', 'softmax', 'sigmoid', 'tanh'
    Number_of_out_node int/'auto' Number of nodes of the output layer. If there is only one column in the label_file, this variable should be 1. 'auto' is for multiple columns.
    Optimizer str The name of the optimizer. Examples: tf.keras.optimizers
    Cost_function str Name of cost function in Tensorflow. Examples: tf.keras.losses
    Metrics list A list of metrics to evaluate the model. Examples: tf.keras.metrics
    Batch_size int The batch size. See tf.keras.Model.fit
    Epochs int Number of epochs for training. See tf.keras.Model.fit
    Verbose int Verbosity mode. See tf.keras.Model.fit
    Regularization bool Whether to used the L2 regularization. See tf.keras.regularizers.L2.
    Model_save_path str A folder to store the well-trained NN model.
    Log_save_path str A folder to store the training log.
    Prediction_save_path str A folder to store the predictions of input features after training.
    SEED int Random seed for shuffling input dataset.

Run main function

python -m elasticnet

The following python code will be executed.

def main():
    # prepare dataset
    from elasticnet.prepare_input import x_main, y_main
    x_main('input_config.json', load_PCA=False, save_PCA=True)
    y_main('input_config.json')

    # train
    from elasticnet.ann import CV_ML_RUN, load_and_pred
    CV_ML_RUN('train.json')
    load_and_pred('train.json', 'x_data_after_pca.txt', write_pred_log=True, drop_cols=None)

main()

You may want to prepare the dataset and train the model in separate steps, see below ↓.

Collect input features and labels

from elasticnet.prepare_input import x_main, y_main
x_main('input_config.json', load_PCA=False, save_PCA=True)
y_main('input_config.json')

Three files will be generated:

  • x_data_init.txt: input features without PCA.
  • x_data_after_pca.txt: input features after PCA.
  • y_data.txt: labels

Train

  • Run the following python code.
from elasticnet import CV_ML_RUN, load_and_pred
if __name__ == '__main__':
    CV_ML_RUN('train.json')
    load_and_pred('train.json', 'x_data_after_pca.txt', write_pred_log=True, drop_cols=None)
  • You can also execute python -m elasticnet directly in the console. See Run main function.

Check training results

  • Generated files/folders
    • checkpoint: A folder for PCA model, NN model, and other information for generating input features.
      • cp.ckpt: Location of NN model.
      • log: Learning curves and weights of all CV models.
      • pred: Predictions of input features.
        • prediction_all.txt: all CV models.
        • prediction_mean.txt: average of CV models.
      • pca_model.joblib: PCA model.
      • scale_range.json: Range to rescale input features.
      • scale_range_1.json: Range to rescale input features again.

Predict

  • After training, run the following python code:
from elasticnet import predict_formula  
pf = predict_formula(config='input_config.json',ckpt_file='checkpoint')  
pf.predict(*['VNbTa', 'TiNbTa'])   
  • The mechanical properties of (VNbTa)C3 and (TiNbTa)C3 will show on the screen. The specific modulus of each column is: B, G, E, Hv, C11, C44.
array([[294.43195 , 203.70157 , 496.67032 ,  25.989697, 632.3356  ,
        175.50716 ],
       [283.17245 , 201.96506 , 489.7816  ,  26.824062, 607.07336 ,
        178.52579 ]], dtype=float32)

High-throughput predict

  • Run the following python code:
from elasticnet import high_throughput_predict
high_throughput_predict() 
  • Output: ANN_predictions.xlsx

Ternary plot

  • Run the following python code:
from elasticnet import ternary_plot
ternary_plot(elements = ['Ti', 'Nb', 'Ta'])
  • Alternatively, elements = ['VNbTa', 'Ti', 'Hf'].

  • Output: phase_diagrams/**_diagram.csv

  • Plot.

Other scripts

Get ROM

  • Run the following python code:
from elasticnet import get_rom
ROM = get_rom(config='input_config.json', formulas='formulas.txt', props=['B', 'G', 'E', 'Hv', 'VEC'])
print(ROM)
  • Output. If the formulas.txt contains ['VNbTa', 'TiNbTa'] only.
array([[310.33922223, 210.80075867, 515.61666613,  26.20022487,
          9.        ],
       [291.74733333, 199.9075404 , 488.11937417,  25.52194014,
          8.66666667]])

Get VEC

  • VEC is simply the last column of Get ROM.

Abbreviations

Abbr. Full name
MTMC Multi-component transition metal carbides
HECC High-entropy carbide ceramic
HEC High-entropy ceramic
ML Machine learning
SOAP Smooth overlap of atomic positions
NN Neural networks
CV cross validation
ROM Rule of mixtures
VEC Valence electron concentration

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

elasticnet-1.0.4.tar.gz (22.0 kB view hashes)

Uploaded Source

Built Distribution

elasticnet-1.0.4-py3-none-any.whl (19.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page