Skip to main content

FARScore: A Synthetic Accseeibility Predictor based Fragment Assembly autoRegressive pretrain

Project description

AIDD PyPI GitHubEmail License: MIT

FARScore: Molecular Synthetic Accessibility Predictor

Fragment Assembly autoRegressive based synthetic accessibility scorer to accelerate drug discovery

🎯 What Makes FARScore Different

FARScore revolutionizes synthetic accessibility prediction through Fragment Assembly autoRegressive pretraining. Unlike traditional approaches that directly learn synthesis patterns, FARScore first masters molecular construction fundamentals—understanding how molecules are assembled from fragments—then applies this knowledge to predict synthetic accessibility.

Two-Stage Learning:

  • Stage 1: Pretrain on 9.2M unlabeled molecules to learn molecular assembly patterns
  • Stage 2: Finetune on 800K labeled molecules for synthetic accessibility prediction

This mirrors human chemical intuition: experienced chemists understand molecular construction before assessing synthetic difficulty.

✨ Key Features

  • Easy Integration - Simple CSV input/output format
  • Batch Prediction - One-click synthetic accessibility scoring
  • High Accuracy - Achieves SOTA performance on multiple test sets with key metrics including accuracy, AUROC and specificity.

🌐 Online Service

Instant molecular synthesis prediction in the cloud. Simply upload your CSV file with SMILES and receive AI-powered synthetic accessibility scores in seconds.

🚀 Quick Start

1. Installation

    # Clone repository
    git clone https://github.com/simmzx/FARScore.git
    cd ../FARScore

    # Create environment and install dependencies
    conda create -n FARScore python=3.8
    conda activate FARScore
    pip install -r requirements.txt

2. Prepare Data

Create CSV file with "smiles" field:

molecule_id smiles
Palbociclib CC1=C(C(=O)N(C2=NC(=NC=C12)NC3=NC=C(C=C3)N4CCNCC4)C5CCCC5)C(=O)C
(+)-Eburnamonine [C@]12(C3=C4CCN1CCC[C@@]2(CC(=O)N3C1C4=CC=CC=1)CC)[H]

3. Run Prediction

CSV File Mode

    python farscore.py --input_file example.csv

Direct SMILES Mode

    # Single molecule
    python farscore.py --smiles "CCO"
    # Multiple molecules
    python farscore.py --smiles "CCO" "CC(=O)O" "c1ccccc1"

4. View Results

Output file will contain FARScore values:

molecule_id smiles farscore
Palbociclib CC1=C(C(=O)N(C2=NC(=NC=C12)NC3=NC=C(C=C3)N4CCNCC4)C5CCCC5)C(=O)C 0.9453
(+)-Eburnamonine [C@]12(C3=C4CCN1CCC[C@@]2(CC(=O)N3C1C4=CC=CC=1)CC)[H] 0.0286

FARScore Interpretation:

  • Close to 1: Easy to synthesize
  • Close to 0: Hard to synthesize
  • Threshold 0.5: Binary classification cutoff

📖 Advanced Usage

Custom Pretraining and Finetuning task

Pretrain Model

    python farscore_pretrain.py \
        --dataset smiles.txt \
        --vocab fragment.txt 

Note: smiles.txt contains unlabeled molecules, fragment.txt is a fragment vocabulary generated by ./scripts/utils/mol/cls.py from smiles.txt for fragment assembly autoregressive pretrain.

Finetune Model

    python farscore_finetune.py \
        --input_model_file gnn_pretrained.pth \
        --dataset dataset.csv

Note: gnn_pretrained.pth is a model saved in pretraining stage, dataset.csv contains labeled molecules for finetune on specific downstream task.

🔧 Requirements

  • Python 3.8-3.10
  • CUDA-enabled GPU (recommended)
  • Key dependencies: PyTorch, RDKit, DGL, DeepChem

📄 Citation

If this program is useful to you, please cite our paper:

📧 Contact

For questions, please contact: Xiang Zhang (Email: zhangxiang@simm.ac.cn)


🌟 Like this project? Give us a Star

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

farscore-1.0.1.tar.gz (14.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

farscore-1.0.1-py3-none-any.whl (14.5 MB view details)

Uploaded Python 3

File details

Details for the file farscore-1.0.1.tar.gz.

File metadata

  • Download URL: farscore-1.0.1.tar.gz
  • Upload date:
  • Size: 14.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for farscore-1.0.1.tar.gz
Algorithm Hash digest
SHA256 ca8661958402a80cdbb8871fabcf55efe7b62681bd6189919c765fd165d2dfc8
MD5 8438d52ddfb0c6179b46cec3b45b99cf
BLAKE2b-256 c10bf993e662e46b807836999f90673dee05a063c25086da7d742e6780a03265

See more details on using hashes here.

File details

Details for the file farscore-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: farscore-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 14.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for farscore-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bfe16e50c201b4a0f88eb4d1b3b049312a9bbf574f237afeda1b8e2016878f0c
MD5 3da9072359ae8e87692963b970ce7b75
BLAKE2b-256 a2716951823083d7be1284fd10c8cf4c1267a42a1a63273ef48671731d672425

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page