Skip to main content

spbnet

Project description

SpbNet

SpbNet is the official implementation of the paper.

The demo to fine-tune SpbNet

A example to finetune SpbNet has been uploaded to Figshare. To run this demo, download the demo.tar.gz file and extract it. The directory should be look like

Root
├── demo
|  ├── ckpt
|  ├── config.example.yaml
|  ├── data
|  |  ├── benchmark.csv
|  |  ├── benchmark.filter.csv
|  |  ├── benchmark.test.csv
|  |  ├── benchmark.train.csv
|  |  ├── benchmark.validate.csv
|  |  ├── cif
|  |  └── spbnet
|  ├── logs
|  |  └── hmof
|  └── main.py
└── demo.tar.gz

To finetune, you should first download the pretrained weight from Figshare. Put the weight to demo/ckpt directory. Then you can install spbnet and finetune spbnet.

cd demo
# optional: conda create -n spbtest python=3.10
# optional: conda activate spbtest
pip install spbnet
python ./finetune.py
python ./predict.py
python ./feat.py

The log will be put in the logs directory. We have provided the expected result in the logs/CO2-298-2.5/version_0 direcotry.

After fine-tuned for 30 epochs, the result should look like:

---------------------------------------
	Test metric		DataLoader 0
---------------------------------------
	test_mae		1.5358973344167073
	test_mse		3.877110533444727
	test_r2			0.3504098369900308
---------------------------------------

The predicted result can be found in the log directory, which should look like logs/hmof/CO2-298-2.5/version_0/test_result.csv.

Prepare

To finetune spbnet, we recommend to make a new directory.

mkdir test
cd test

Install

Depedencies

All the code is tested on Linux. We recommend using Linux to reproduce or use SpbNet.

SpbNet depends on

python>=3.8

Installation

SpbNet is uploaded to Pypi. Simply use

pip install spbnet

Download weight

The weight has been uploaded in Figshare.

Save the weight to your directory, such as ./ckpt/spbnet.180k.ckpt

Your directory should look like:

- test
    - ckpt
        spbnet.180k.ckpt

Data Preprocessing

SpbNet need to preprocess the cif format files to obtain structure and potential energy embedding.

Install GRIDAY

The code to generate energy grid depend on make and g++. Use the following command to install.

spbnet install-make
spbnet make-griday

The first command will install make and C++ compiler tools via conda. The second command will compile the code to generate energy grid.

Build Data

SpbNet has provided the command to preprocess data.

To use SpbNet, please provide your dataset first. Your dataset should look like

- test
    - ckpt
        spbnet.180k.ckpt
    - data
        - cif
            mof1.cif
            mof2.cif
            ...
        benchmark.csv

The benchmark.csv contains the label data, which should look like

cifid,CO2
mof1,1.29
mof2,3.81
...

The cifid colume is needed, while other colums represent the tasks needed to be predicted.

Then use the following command to build data:

spbnet build-data --root-dir ./data

SpbNet will produce the following files:

- test
    - ckpt
        spbnet.180k.ckpt
    - data
        - cif
            - mof1.cif
            - mof2.cif
            ...
        - spbnet
            - graphdata
                - mof1.graphdata
                - mof2.graphdata
                ...
            - grid
                - mof1.grid
                - mof2.grid
                ...
            - griddata8
                - mof1.npy
                - mof2.npy
                ...
        benchmark.csv

Split Labels

You can split the label data to test the SpbNet's performance. We have provide a command to do this.

spbnet filter-data --root-dir ./data

This command will look up the benchmark.csv to check if all the cif files are correctly preprocessed. In addtion, filter-data will filter all the outlier point according to Q1 - outlier * IQR and Q3 + outlier * IQR. By default, outlier is set to 5. You can use spbnet filter-data --root-dir PATH/TO/YOUR/ROOT_DIR --outlier -1 to cancel this behavior.

If correctly preprocessed, the directory should look like

- test
    - ckpt
        spbnet.180k.ckpt
    - data
        - cif
            xxx.cif
        - spbnet
            - graphdata
                xxx.graphdata
            - grid
                xxx.grid
            - griddata8
                xxx.npy
        benchmark.csv
        benchmark.filter.csv
        benchmark.train.csv
        benchmark.validate.csv
        benchmark.test.csv

You can check if the data is correctly preprocessed by

spbnet check-data --root-dir ./data

This command will automatically check the benchmark.train.csv, benchmark.validate.csv and benchmark.test.csv.

Finetune

After data preprocessing, you can finetune SpbNet.

To configure spbnet, provide a configuration file, such as config.example.yaml. The file should look like

ckpt: './ckpt/spbnet.18w.ckpt'
data_dir: './data'
id_prop: './data/benchmark.csv'
task: 'CO2-298-2.5'
log_dir: './logs/hmof'
  • ckpt: pPath to the checkpoint to finetune
  • data_dir: The root directory of data
  • id_prop: The label data. If it is set to benchmark.csv, spbnet will automatically find benchmark.train.csv, benchmark.validate.csv and benchmark.test.csv
  • task: The task to train. Should be one of the colums in the benchmark.csv
  • log_dir: The logger directory. Used by pytorch-lightning

NOTE: More configuration can be found in github repository. An important configuration is max_graph_len, which is max length of tokens, since SpbNet is based on Transformer architecture. To choose max_graph_len, you can estimate the average atom number using this command first.

spbnet calc-atomnum --root-dir ./data/cif

We recommend to choose the number most close to the average number of atoms of your dataset from 512, 768, 1024. For hMOF and CoREMOF dataset, we recommend 512 (default). For tobacco dataset, we recommend to use 1024.

Thus, your directory should look like:

- test
    - ckpt
        spbnet.180k.ckpt
    - data
        - cif
            xxx.cif
        - spbnet
            - graphdata
                xxx.graphdata
            - grid
                xxx.grid
            - griddata8
                xxx.npy
        benchmark.csv
        benchmark.filter.csv
        benchmark.train.csv
        benchmark.validate.csv
        benchmark.test.csv
    config.example.yaml
    main.py

The main.py should contain the code to finetune spbnet, which should like:

import spbnet

spbnet.finetune("./config.example.yaml")

After finetuning, the result (checkpoint and test result) should be saved in the directory specified by log_dir. The path may look like: ./logs/hmof/CO2-298-2.5/version_0.

Your directory should look like:

- test
    - ckpt
        spbnet.180k.ckpt
    - data
        ...
    - logs
        ...
    config.example.yaml
    main.py

Predict

After finetune, the checkpoints and hyperparamters should be found in directory like ./tests/logs/hmof/CO2-298-2.5/version_0. Due to that SpbNet will automatically normalize the training data during training to increase training stability. SpbNet uses mean-variance normalization. The mean and std used can be found in .../version_0/hparams.yaml.

Based on the finetuned checkpoint and mean, std. You can predict the target property. First provide the data_dir and the id_prop csv file. The id_prop file should looks like

cifid
mof1
mof2
...

Then prepare a configuration yaml file like

ckpt: './logs/hmof/CO2-298-2.5/version_0/checkpoints/last.ckpt'
data_dir: './data/spbnet'
id_prop: './data/benchmark.test.csv' # the id_prop file
log_dir: './predict'

mean: 5.325830404166666
std: 2.6947958848152913

Then predict

import spbnet

spbnet.predict("./config.predict.yaml")

The test_results.csv will be saved in directory like predict/version_0. It shoud look like

cifid,predict
hMOF-4000155,12.504342079162598
hMOF-5024342,2.8985860347747803
hMOF-25731,4.6314473152160645

Visualize

SpbNet provide visualization of attention score and atom grid.

To visualize cif file, make a new directory and prepare a cif format file. Such as

- test
    ...
    - vis
        - cif
            mof1.cif

Change to the vis directory and build modal data:

cd vis
spbnet build-modal-data --cif-path ./cif/mof1.cif

By default, SpbNet will make a modal directory under vis directory. The directory should look like:

- test
    ...
    - vis
        - cif
            mof1.cif
        - modal
            - attn
            - energycell
                mof1.cif
            - graphdata
                mof1.graphdata
            - grid
                mof1.grid
            - griddata
                mof1.griddata
            - griddata8
                mof1.griddata8
            - mol
            - supercell
                mof1.cif
            - total

Now you can get attention score.

spbnet attn --cif-dir ./cif/mof1.cif --modal-dir ./modal --ckpt PATH/TO/YOUR/CKPT

SpbNet will make an attn directory under your current directory, with the mof1.html file.

Now open the mof1.html. You should see 3dmol like the following.

Cooperative

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spbnet-0.0.26.tar.gz (1.0 MB view hashes)

Uploaded Source

Built Distribution

spbnet-0.0.26-py3-none-any.whl (1.1 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page