FARScore: A Synthetic Accseeibility Predictor based Fragment Assembly autoRegressive pretrain
Project description
FARScore: Molecular Synthetic Accessibility Predictor
Fragment Assembly autoRegressive based synthetic accessibility scorer to accelerate drug discovery
🎯 What Makes FARScore Different
FARScore revolutionizes synthetic accessibility prediction through Fragment Assembly autoRegressive pretraining. Unlike traditional approaches that directly learn synthesis patterns, FARScore first masters molecular construction fundamentals—understanding how molecules are assembled from fragments—then applies this knowledge to predict synthetic accessibility.
Two-Stage Learning:
- Stage 1: Pretrain on 9.2M unlabeled molecules to learn molecular assembly patterns
- Stage 2: Finetune on 800K labeled molecules for synthetic accessibility prediction
This mirrors human chemical intuition: experienced chemists understand molecular construction before assessing synthetic difficulty.
✨ Key Features
- Easy Integration - Simple CSV input/output format
- Batch Prediction - One-click synthetic accessibility scoring
- High Accuracy - Achieves SOTA performance on multiple test sets with key metrics including accuracy, AUROC and specificity.
🌐 Online Service
Instant molecular synthesis prediction in the cloud. Simply upload your CSV file with SMILES and receive AI-powered synthetic accessibility scores in seconds.
🚀 Quick Start
1. Installation
# Clone repository
git clone https://github.com/simmzx/FARScore.git
cd ../FARScore
# Create environment and install dependencies
conda create -n FARScore python=3.8
conda activate FARScore
pip install -r requirements.txt
2. Prepare Data
Create CSV file with "smiles" field:
| molecule_id | smiles |
|---|---|
| Palbociclib | CC1=C(C(=O)N(C2=NC(=NC=C12)NC3=NC=C(C=C3)N4CCNCC4)C5CCCC5)C(=O)C |
| (+)-Eburnamonine | [C@]12(C3=C4CCN1CCC[C@@]2(CC(=O)N3C1C4=CC=CC=1)CC)[H] |
3. Run Prediction
CSV File Mode
python farscore.py --input_file example.csv
Direct SMILES Mode
# Single molecule
python farscore.py --smiles "CCO"
# Multiple molecules
python farscore.py --smiles "CCO" "CC(=O)O" "c1ccccc1"
4. View Results
Output file will contain FARScore values:
| molecule_id | smiles | farscore |
|---|---|---|
| Palbociclib | CC1=C(C(=O)N(C2=NC(=NC=C12)NC3=NC=C(C=C3)N4CCNCC4)C5CCCC5)C(=O)C | 0.9453 |
| (+)-Eburnamonine | [C@]12(C3=C4CCN1CCC[C@@]2(CC(=O)N3C1C4=CC=CC=1)CC)[H] | 0.0286 |
FARScore Interpretation:
- Close to 1: Easy to synthesize
- Close to 0: Hard to synthesize
- Threshold 0.5: Binary classification cutoff
📖 Advanced Usage
Custom Pretraining and Finetuning task
Pretrain Model
python farscore_pretrain.py \
--dataset smiles.txt \
--vocab fragment.txt
Note: smiles.txt contains unlabeled molecules, fragment.txt is a fragment vocabulary generated by ./scripts/utils/mol/cls.py from smiles.txt for fragment assembly autoregressive pretrain.
Finetune Model
python farscore_finetune.py \
--input_model_file gnn_pretrained.pth \
--dataset dataset.csv
Note: gnn_pretrained.pth is a model saved in pretraining stage, dataset.csv contains labeled molecules for finetune on specific downstream task.
🔧 Requirements
- Python 3.8-3.10
- CUDA-enabled GPU (recommended)
- Key dependencies: PyTorch, RDKit, DGL, DeepChem
📄 Citation
If this program is useful to you, please cite our paper:
📧 Contact
For questions, please contact: Xiang Zhang (Email: zhangxiang@simm.ac.cn)
🌟 Like this project? Give us a Star
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file farscore-1.0.1.tar.gz.
File metadata
- Download URL: farscore-1.0.1.tar.gz
- Upload date:
- Size: 14.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca8661958402a80cdbb8871fabcf55efe7b62681bd6189919c765fd165d2dfc8
|
|
| MD5 |
8438d52ddfb0c6179b46cec3b45b99cf
|
|
| BLAKE2b-256 |
c10bf993e662e46b807836999f90673dee05a063c25086da7d742e6780a03265
|
File details
Details for the file farscore-1.0.1-py3-none-any.whl.
File metadata
- Download URL: farscore-1.0.1-py3-none-any.whl
- Upload date:
- Size: 14.5 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bfe16e50c201b4a0f88eb4d1b3b049312a9bbf574f237afeda1b8e2016878f0c
|
|
| MD5 |
3da9072359ae8e87692963b970ce7b75
|
|
| BLAKE2b-256 |
a2716951823083d7be1284fd10c8cf4c1267a42a1a63273ef48671731d672425
|