ANNalog — a SMILES-to-SMILES seq2seq model for medchem analogue generation
Project description
ANNalog
ANNalog, a SMILES-to-SMILES generative model for medicinal chemistry analogue design.
Introduction
ANNalog is a transformer-based sequence-to-sequence (Seq2Seq) model designed to generate medicinal-chemistry-relevant analogues of an input molecule. It supports:
- local chemical-space exploration (small, SAR-like modifications), and
- scaffold hopping (changing the core scaffold while remaining chemically relevant).
The accompanying preprint describes training on pairs of molecules drawn from the same bioactivity assay (extracted from ChEMBL), Levenshtein distance–guided SMILES alignment to improve learning of transformations, and a prefix-control feature to constrain generation.
PAPER (ChemRxiv)
https://chemrxiv.org/doi/10.26434/chemrxiv-2025-9c1v6
INSTALLATION (Conda, recommended)
This repository includes a conda environment file (e.g. seq2seq_environment.yml).
-
Create the environment: conda env create -f seq2seq_environment.yml
-
Activate it (env name comes from the yml, e.g. "annalog"): conda activate annalog
-
Install ANNalog into the environment: pip install -e .
Note:
- If conda solving fails due to strict channel priority, try: conda config --set channel_priority flexible then re-run the environment creation.
GENERATION (generation.py)
generation.py generates candidate SMILES strings from an input SMILES using a trained checkpoint + vocab.
RESOURCES (checkpoint + vocab)
By default, the script looks relative to generation.py:
ckpt_and_vocab/Lev_extended.pt ckpt_and_vocab/stereo_experiment_vocab_ttf.pkl
If your files are elsewhere, use --resources-dir or override --checkpoint/--vocab.
QUICK START
Single SMILES (sampling, 10 outputs): python generation.py -i "CC(Cl)Br" -m sampling -n 10 -p 0 -f tsv -o gen_single.tsv --temperature 1.2 --seed 42
Batch file (.smi, one SMILES per line): python generation.py -i inputs.smi -m beam -n 100 -o gen_batch.tsv
REQUIRED ARGUMENTS
-
-i, --input Input SMILES string OR a path to a .smi file (one SMILES per line).
-
-n, --generation-number Number to generate (beam width or number of samples). REQUIRED.
OPTIONAL ARGUMENTS
Generation:
- -m, --method {beam, BF-beam, sampling} (default: beam)
- --temperature FLOAT (sampling only; default: 1.2)
- --seed INT (sampling only; default: 42)
- -p, --prefix PREFIX (default: 0)
- 0 = no prefix constraint
- integer like 5 = use first 5 characters of the input as prefix
- string like "CC" = literal prefix (must match the start of the input)
- -k, --keep-invalid Keep invalid SMILES (disables invalid filtering). By default, invalid filtering is ON.
- --max-length INT (default: 102)
Model/resources:
- --resources-dir PATH (default: <script_dir>/ckpt_and_vocab)
- --checkpoint PATH / --ckpt PATH (default: /Lev_extended.pt)
- --vocab PATH (default: /stereo_experiment_vocab_ttf.pkl)
Output:
- -f, --format {tsv,csv} (default: tsv)
- -o, --out PATH output path, or '-' for stdout (default: -)
Device:
- --device {cpu,cuda} force device (default: auto-detect)
OUTPUT FORMAT
The output includes a header row with: input_smiles, rank (1-based), generated_smiles, score
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file annalog-1.0.3.tar.gz.
File metadata
- Download URL: annalog-1.0.3.tar.gz
- Upload date:
- Size: 15.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9e05d9eae02aceb98d43450088a291c4b74bc7d1223c6d79b74ef03250d14d2d
|
|
| MD5 |
e292032cc66e97b909258affdd0a5fda
|
|
| BLAKE2b-256 |
91103a36f3ca947c407ce852ad9763e62c7297eee42d9c88895f6c17336e8cf6
|
File details
Details for the file annalog-1.0.3-py3-none-any.whl.
File metadata
- Download URL: annalog-1.0.3-py3-none-any.whl
- Upload date:
- Size: 15.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
72cba303e4ed36459963602aa2ec3765d4ffeb527921fcb0bdea7e046d181139
|
|
| MD5 |
a808cffc66cdb4633f5d1ce345ad6208
|
|
| BLAKE2b-256 |
cf86b8d6125b39e2c13cf66357b54acaa37f2d5c4b61432c8beb33743c7cc964
|