spbnet
Project description
SpbNet
SpbNet is the official implementation of the paper.
The demo to fine-tune SpbNet
A example to finetune SpbNet has been uploaded to Figshare. To run this demo, download the demo.tar.gz
file and extract it. The directory should be look like
Root
├── demo
| ├── ckpt
| ├── config.example.yaml
| ├── data
| | ├── benchmark.csv
| | ├── benchmark.filter.csv
| | ├── benchmark.test.csv
| | ├── benchmark.train.csv
| | ├── benchmark.validate.csv
| | ├── cif
| | └── spbnet
| ├── logs
| | └── hmof
| └── main.py
└── demo.tar.gz
To finetune, you should first download the pretrained weight from Figshare. Put the weight to demo/ckpt
directory. Then you can install spbnet and finetune spbnet.
cd demo
# optional: conda create -n spbtest python=3.10
# optional: conda activate spbtest
pip install spbnet
python ./finetune.py
python ./predict.py
python ./feat.py
The log will be put in the logs
directory. We have provided the expected result in the logs/CO2-298-2.5/version_0
direcotry.
After fine-tuned for 30
epochs, the result should look like:
---------------------------------------
Test metric DataLoader 0
---------------------------------------
test_mae 1.5358973344167073
test_mse 3.877110533444727
test_r2 0.3504098369900308
---------------------------------------
The predicted result can be found in the log directory, which should look like logs/hmof/CO2-298-2.5/version_0/test_result.csv
.
Prepare
To finetune spbnet, we recommend to make a new directory.
mkdir test
cd test
Install
Depedencies
All the code is tested on Linux. We recommend using Linux to reproduce or use SpbNet.
SpbNet depends on
python>=3.8
Installation
SpbNet is uploaded to Pypi. Simply use
pip install spbnet
Download weight
The weight has been uploaded in Figshare.
Save the weight to your directory, such as ./ckpt/spbnet.180k.ckpt
Your directory should look like:
- test
- ckpt
spbnet.180k.ckpt
Data Preprocessing
SpbNet need to preprocess the cif
format files to obtain structure and potential energy embedding.
Install GRIDAY
The code to generate energy grid depend on make
and g++
. Use the following command to install.
spbnet install-make
spbnet make-griday
The first command will install make
and C++ compiler tools via conda.
The second command will compile the code to generate energy grid.
Build Data
SpbNet has provided the command to preprocess data.
To use SpbNet, please provide your dataset first. Your dataset should look like
- test
- ckpt
spbnet.180k.ckpt
- data
- cif
mof1.cif
mof2.cif
...
benchmark.csv
The benchmark.csv
contains the label data, which should look like
cifid,CO2
mof1,1.29
mof2,3.81
...
The cifid
colume is needed, while other colums represent the tasks needed to be predicted.
Then use the following command to build data:
spbnet build-data --root-dir ./data
SpbNet will produce the following files:
- test
- ckpt
spbnet.180k.ckpt
- data
- cif
- mof1.cif
- mof2.cif
...
- spbnet
- graphdata
- mof1.graphdata
- mof2.graphdata
...
- grid
- mof1.grid
- mof2.grid
...
- griddata8
- mof1.npy
- mof2.npy
...
benchmark.csv
Split Labels
You can split the label data to test the SpbNet's performance. We have provide a command to do this.
spbnet filter-data --root-dir ./data
This command will look up the benchmark.csv
to check if all the cif files are correctly preprocessed.
In addtion, filter-data
will filter all the outlier point according to Q1 - outlier * IQR
and Q3 + outlier * IQR
.
By default, outlier
is set to 5
. You can use spbnet filter-data --root-dir PATH/TO/YOUR/ROOT_DIR --outlier -1
to cancel this behavior.
If correctly preprocessed, the directory should look like
- test
- ckpt
spbnet.180k.ckpt
- data
- cif
xxx.cif
- spbnet
- graphdata
xxx.graphdata
- grid
xxx.grid
- griddata8
xxx.npy
benchmark.csv
benchmark.filter.csv
benchmark.train.csv
benchmark.validate.csv
benchmark.test.csv
You can check if the data is correctly preprocessed by
spbnet check-data --root-dir ./data
This command will automatically check the benchmark.train.csv
, benchmark.validate.csv
and benchmark.test.csv
.
Finetune
After data preprocessing, you can finetune SpbNet.
To configure spbnet, provide a configuration file, such as config.example.yaml
. The file should look like
ckpt: './ckpt/spbnet.18w.ckpt'
data_dir: './data'
id_prop: './data/benchmark.csv'
task: 'CO2-298-2.5'
log_dir: './logs/hmof'
- ckpt: pPath to the checkpoint to finetune
- data_dir: The root directory of data
- id_prop: The label data. If it is set to
benchmark.csv
, spbnet will automatically findbenchmark.train.csv
,benchmark.validate.csv
andbenchmark.test.csv
- task: The task to train. Should be one of the colums in the
benchmark.csv
- log_dir: The logger directory. Used by
pytorch-lightning
NOTE: More configuration can be found in github repository. An important configuration is max_graph_len
, which is max length of tokens, since SpbNet is based on Transformer architecture. To choose max_graph_len
, you can estimate the average atom number using this command first.
spbnet calc-atomnum --root-dir ./data/cif
We recommend to choose the number most close to the average number of atoms of your dataset from 512
, 768
, 1024
. For hMOF
and CoREMOF
dataset, we recommend 512
(default). For tobacco dataset, we recommend to use 1024
.
Thus, your directory should look like:
- test
- ckpt
spbnet.180k.ckpt
- data
- cif
xxx.cif
- spbnet
- graphdata
xxx.graphdata
- grid
xxx.grid
- griddata8
xxx.npy
benchmark.csv
benchmark.filter.csv
benchmark.train.csv
benchmark.validate.csv
benchmark.test.csv
config.example.yaml
main.py
The main.py
should contain the code to finetune spbnet, which should like:
import spbnet
spbnet.finetune("./config.example.yaml")
After finetuning, the result (checkpoint and test result) should be saved in the directory specified by log_dir
.
The path may look like: ./logs/hmof/CO2-298-2.5/version_0
.
Your directory should look like:
- test
- ckpt
spbnet.180k.ckpt
- data
...
- logs
...
config.example.yaml
main.py
Predict
After finetune, the checkpoints and hyperparamters should be found in directory like ./tests/logs/hmof/CO2-298-2.5/version_0
. Due to that SpbNet will automatically normalize the training data during training to increase training stability. SpbNet uses mean-variance normalization. The mean and std used can be found in .../version_0/hparams.yaml
.
Based on the finetuned checkpoint and mean, std. You can predict the target property. First provide the data_dir
and the id_prop
csv file. The id_prop
file should looks like
cifid
mof1
mof2
...
Then prepare a configuration yaml
file like
ckpt: './logs/hmof/CO2-298-2.5/version_0/checkpoints/last.ckpt'
data_dir: './data/spbnet'
id_prop: './data/benchmark.test.csv' # the id_prop file
log_dir: './predict'
mean: 5.325830404166666
std: 2.6947958848152913
Then predict
import spbnet
spbnet.predict("./config.predict.yaml")
The test_results.csv
will be saved in directory like predict/version_0
. It shoud look like
cifid,predict
hMOF-4000155,12.504342079162598
hMOF-5024342,2.8985860347747803
hMOF-25731,4.6314473152160645
Visualize
SpbNet provide visualization of attention score and atom grid.
To visualize cif file, make a new directory and prepare a cif
format file. Such as
- test
...
- vis
- cif
mof1.cif
Change to the vis
directory and build modal data:
cd vis
spbnet build-modal-data --cif-path ./cif/mof1.cif
By default, SpbNet will make a modal
directory under vis
directory. The directory should look like:
- test
...
- vis
- cif
mof1.cif
- modal
- attn
- energycell
mof1.cif
- graphdata
mof1.graphdata
- grid
mof1.grid
- griddata
mof1.griddata
- griddata8
mof1.griddata8
- mol
- supercell
mof1.cif
- total
Now you can get attention score.
spbnet attn --cif-dir ./cif/mof1.cif --modal-dir ./modal --ckpt PATH/TO/YOUR/CKPT
SpbNet will make an attn
directory under your current directory, with the mof1.html
file.
Now open the mof1.html
. You should see 3dmol
like the following.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.