Skip to main content

A package that uses pretrained graph-neural network models and symbolic regression formulas on material descriptors as featurizers for interpretable predictions in materials science.

Project description

    MatterVial-logo    

Materials Feature Extraction via Interpretable Artificial Learning

Overview

MatterVial is a featurizer tool designed for materials science, leveraging both graph-neural networks (GNNs) and traditional feature engineering to extract valuable chemical information from materials structures and compositions. It aims to enhance the performance of materials property prediction models by generating meaningful features for a variety of machine learning tasks. MatterVial stands for MATerials feaTuRe Extraction Via Interpretable Artificial Learning, evoking the metaphor of a vial containing distilled knowledge from materials data, representing our tool's ability to extract and contain valuable materials insights.

Available Featurizers

MatterVial offers a diverse set of feature extraction tools, each designed to capture different aspects of materials' chemical and structural information:

Graph Neural Network (GNN) Featurizers

  • DescriptorMEGNetFeaturizer: Unified descriptor-oriented MEGNet featurizer as described in the methods section. Provides a single interface for retrieving encoded features from pretrained MEGNet models:

    • base_descriptor='l-MM_v1' (default): Extracts 758 ℓ-MM (latent MatMiner) features
    • base_descriptor='l-OFM_v1': Extracts 188 ℓ-OFM (latent Orbital Field Matrix) features
  • AdjacentGNNFeaturizer: Implements the Adjacent GNN featurizer described in the methods section. Trains task-specific GNN models on-the-fly using the user's dataset for each fold of the train-test split:

    • base_model='MEGNet' (default): Uses MEGNet architecture
    • base_model='coGN': Uses coGN architecture (coming soon)
  • MVLFeaturizer: Extracts features from pretrained MEGNet models provided by the Materials Virtual Lab, trained on diverse Materials Project datasets. These models cover formation energy, Fermi energy, elastic constants (KVRH and GVRH), and band gap properties. Users can select different intermediate layers (16-neuron or 32-neuron) from the regression head of the MEGNet models, or combine features from multiple layers.

  • ORBFeaturizer: Extracts features using the ORB-v3 machine learning interatomic potential (MLIP). ORB models provide high-quality structural embeddings based on orbital-based representations and require a specialized conda environment.

Legacy Featurizers (Deprecated)

  • LatentMMFeaturizer (ℓ-MM): ⚠️ Deprecated - Use DescriptorMEGNetFeaturizer(base_descriptor='l-MM_v1') instead. This featurizer extracts latent space features from MatMiner descriptors using a pretrained MEGNet model.

  • LatentOFMFeaturizer (ℓ-OFM): ⚠️ Deprecated - Use DescriptorMEGNetFeaturizer(base_descriptor='l-OFM_v1') instead. This featurizer extracts latent space features from Orbital Field Matrix (OFM) descriptors.

  • AdjacentMEGNetFeaturizer: ⚠️ Deprecated - Use AdjacentGNNFeaturizer(base_model='MEGNet') instead. This featurizer trains a MEGNet model on-the-fly using the user's dataset.

Composition-Based Featurizers

  • RoostModelFeaturizer: Featurizer for ROOST (Representation Learning from Stoichiometry) models, which extract features directly from material compositions without requiring structural information. Available pretrained models:
    • roost_mpgap: ROOST model pretrained on the Materials Project band gap dataset
    • roost_oqmd_eform: ROOST model pretrained on the OQMD formation energy dataset

SISSO-based Feature Augmentation

  • SISSO Formula Featurizer: Implements the SISSO-based formula featurizer described in the methods section. This applies symbolic expressions generated by the SISSO++ framework to create new features from existing MatMiner descriptors. The method uses mathematical operators (addition, subtraction, multiplication, division, sine, cosine, exponential, logarithm) to generate interpretable feature combinations across 15 different datasets, providing 20 paired-feature formulas per dataset.

Feature Interpretation

MatterVial provides comprehensive interpretability tools to understand the extracted features:

  • Interpreter: The main interpretability class that bridges the gap between high-level latent representations and interpretable chemical descriptors. It uses surrogate XGBoost models trained on the MP2018-stable dataset to predict each latent feature based on interpretable MatMiner and OFM features.

  • SHAP Analysis: For each feature, the top 30 most influential interpretable descriptors are identified using SHAP values, providing insights into which chemical properties drive the latent representations.

  • Symbolic Regression: The SHAP-identified features are forwarded to SISSO++ for symbolic regression, retrieving symbolic formulas that correlate with latent features and provide chemical interpretability.

  • Visualization: SVG plots and decomposition visualizations help users understand feature importance and chemical relationships.

Installation

To install MatterVial, clone the repository and use the following command:

pip install -r requirements.txt

Ensure you have all the necessary dependencies installed, including TensorFlow, scikit-learn, the MEGNet library and Pytorch.

Usage

Basic Examples

MEGNet Featurizers

Here's an example of how to use MatterVial to extract features from a list of structures:

import pandas as pd
from mattervial import (MVLFeaturizer, AdjacentGNNFeaturizer, DescriptorMEGNetFeaturizer, ORBFeaturizer)

# Initialize featurizers using new unified classes
mvl32 = MVLFeaturizer(layers='layer32')  # Single layer
mvl16 = MVLFeaturizer(layers='layer16')  # Single layer
mvl_all = MVLFeaturizer()  # Both layers (default)

# Initialize descriptor-oriented MEGNet featurizer (new unified approach)
desc_mm = DescriptorMEGNetFeaturizer(base_descriptor='l-MM_v1')  # ℓ-MM featurizer
desc_ofm = DescriptorMEGNetFeaturizer(base_descriptor='l-OFM_v1')  # ℓ-OFM featurizer

# Initialize adjacent GNN featurizer (new unified approach)
adj_gnn = AdjacentGNNFeaturizer(base_model='MEGNet', layers='layer32')

# Initialize ORB featurizer
orb_featurizer = ORBFeaturizer(model_name="ORB_v3")

# Example structures
structures = pd.Series([structure1, structure2, structure3])  # pymatgen Structure objects

# Extract features using MVLFeaturizer
features_32 = mvl32.get_features(structures)  # 160 features from layer32
features_16 = mvl16.get_features(structures)  # 80 features from layer16
features_combined = mvl_all.get_features(structures)  # 240 features (160+80)

# Extract descriptor-oriented MEGNet features (ℓ-MM and ℓ-OFM)
l_mm_features = desc_mm.get_features(structures)  # 758 ℓ-MM features
l_ofm_features = desc_ofm.get_features(structures)  # 188 ℓ-OFM features

# Extract ORB features (requires ORB environment)
orb_features = orb_featurizer.get_features(structures)

# Train the AdjacentGNNFeaturizer on the fly
targets = [1.2, 2.3, 0.8]  # Target property values
adj_gnn.train_adjacent_model(structures, targets=targets, adjacent_model_path='./models/')

# Extract features using the trained AdjacentGNNFeaturizer
features_adj = adj_gnn.get_features(structures)

# Legacy usage (deprecated but still works with warnings)
# from mattervial import LatentMMFeaturizer, LatentOFMFeaturizer, AdjacentMEGNetFeaturizer
# l_mm_legacy = LatentMMFeaturizer()  # Issues deprecation warning
# l_ofm_legacy = LatentOFMFeaturizer()  # Issues deprecation warning
# adj_legacy = AdjacentMEGNetFeaturizer(layers='layer32')  # Issues deprecation warning

ROOST Featurizers

Here's an example of how to use MatterVial to extract features from a list of compositions:

import pandas as pd
from mattervial import RoostModelFeaturizer

# Initialize ROOST featurizers
roost_mpgap = RoostModelFeaturizer(model_type='mpgap')
roost_oqmd_eform = RoostModelFeaturizer(model_type='oqmd_eform')

# Example compositions
compositions = pd.Series(["Fe2O3", "Al2O3"])

# Extract features using roost_mpgap
features_mpgap = roost_mpgap.get_features(compositions)

# Extract features using roost_oqmd_eform
features_oqmd_eform = roost_oqmd_eform.get_features(compositions)

SISSO Featurization

Here's an example of how to use MatterVial to extract features with SISSO:

from mattervial.featurizers import get_sisso_features
# Assuming 'dataset_MatMinerFeaturized.csv' contains your initial featurized data
# and has a 'target' column and other feature columns.
sisso_features_df = get_sisso_features(input_csv_path="dataset_MatMinerFeaturized.csv", type="SISSO_FORMULAS_v1")

# Now 'sisso_features_df' contains only the newly generated SISSO_ features.
# You can merge this DataFrame with your original data if needed.

Feature Interpretation

Here's an example of how to use MatterVial's interpretability tools:

from mattervial.interpreter import Interpreter
import json

# Initialize the interpreter
interpreter = Interpreter()

# Get formula for a latent feature (ℓ-OFM or ℓ-MM)
formula_info = interpreter.get_formula("l-OFM_v1_1")
print("Formula information:")
print(json.dumps(formula_info, indent=2))

# Get SHAP values for feature importance analysis
shap_data = interpreter.get_shap_values("MEGNet_MatMiner_1")
print("\nSHAP analysis:")
print(json.dumps(shap_data, indent=2))

# Get SISSO formula interpretation
sisso_formula = interpreter.get_formula("SISSO_matbench_dielectric_1")
print("\nSISSO formula:")
print(json.dumps(sisso_formula, indent=2))

# Display SVG visualization (if available)
try:
    interpreter.display_svg("MEGNet_MatMiner_1", plot_type="shap_summary")
except Exception as e:
    print(f"Visualization not available: {e}")

⚙️ Environment Setup

MatterVial utilizes specialized models for certain featurizers, some of which have unique and conflicting dependencies. To manage this, we provide specific Conda environment files in the envs/ folder. Please install the environment that corresponds to the features you intend to use.

Primary Environment

This environment supports the main featurizers used in our paper, including MEGNet-based models (MVL featurizers, descriptor-oriented l-MM and l-OFM, and adjacent MEGNet), and ROOST.

1. Create the environment:

conda env create -f envs/env_primary.yml

2. Activate the environment:

conda activate env_primary

ORB Environment

This is a specialized environment required only for using the ORB-v3 MLIP-based featurizer.

1. Create the environment:

conda env create -f envs/env_orb.yml

2. Activate the environment:

conda activate env_orb

KGCNN Environment

This environment is required only for using featurizers based on coGN or coGNN models.

1. Create the environment:

conda env create -f envs/env_kgcnn.yml

2. Activate the environment:

conda activate env_kgcnn

Contributions

We welcome contributions to improve the MatterVial tool, including adding more pretrained models, enhancing the featurization techniques, or improving the feature interpretation capabilities. Please feel free to submit pull requests or create issues for discussion.

License

This project is licensed under the MIT License.

Acknowledgments

The MatterVial tool is built on top of other software packages and publicly available GNN models such as MEGNet, ROOST, ORB-v3 and coGN. We also acknowledge the developers of the SISSO package which was used to augment MatMiner featurizers via symbolic regression and decompose the feature-space of GNN models.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mattervial-0.1.5.tar.gz (59.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mattervial-0.1.5-py3-none-any.whl (61.9 MB view details)

Uploaded Python 3

File details

Details for the file mattervial-0.1.5.tar.gz.

File metadata

  • Download URL: mattervial-0.1.5.tar.gz
  • Upload date:
  • Size: 59.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.21

File hashes

Hashes for mattervial-0.1.5.tar.gz
Algorithm Hash digest
SHA256 f8b712ecc3eaabfc6cdf0b4aa84ff31c37773bec9381960a6b38fbfaa0e0da15
MD5 d76d6bf9da56204d4f62fd96b480a948
BLAKE2b-256 7b0ff352e48582157e543916273318b411e9899a1f2c8250da19c082908044a0

See more details on using hashes here.

File details

Details for the file mattervial-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: mattervial-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 61.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.21

File hashes

Hashes for mattervial-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 7ece5afbb4b69bbacff3e55176adbd03ff1eea4fb30c2273ad3e6f3c9666e68b
MD5 0a80943ef067fb07e358d411a0b781fe
BLAKE2b-256 9293e1733a9052c30f85bd892fea50adfd0465006745f4b5aa10ae70b4599f92

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page