a framework for ingesting, validating, canonicalizing, and adapting retrosynthesis model outputs to a unified benchmark standard.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ischemist

These details have not been verified by PyPI

Project description

RetroCast: A Unified Format for Multistep Retrosynthesis

coverage

The Problem

Every multistep retrosynthesis model returns routes in a different format. AiZynthFinder uses bipartite molecule-reaction graphs. Retro* outputs precursor maps. DirectMultiStep produces recursive dictionaries. SynPlanner has its own schema. This fragmentation makes working with routes unnecessarily difficult.

The Solution

RetroCast provides:

A canonical data model for retrosynthesis routes (schemas.py) - a simple, recursive Molecule/ReactionStep/Route structure that any model output can be cast into.
Tested adapters for every major model - AiZynthFinder, Retro*, DirectMultiStep, SynPlanner, Syntheseus, ASKCOS, RetroChimera, DreamRetro, MultiStepTTL, SynLlama, PARoutes (14 adapters and counting).
Reproducible infrastructure - UV-managed dependencies with conflict resolution, locked versions, and deterministic processing with cryptographic hashing.
Curated evaluation sets - Subsets of the PaRoutes n=1 and n=5 test sets (100, 200, 500, 1k, 2k targets) designed to preserve statistical properties while enabling faster benchmarking.

Quick Start

Install

git clone https://github.com/ischemist/project-procrustes
cd project-procrustes

No need to manage virtual environments - UV handles everything.

Run Any Model in Three Commands

Example: AiZynthFinder with MCTS

# 1. Download model assets (once)
uv run scripts/aizynthfinder/1-download-assets.py data/models/aizynthfinder

# 2. Prepare stock file (once)
uv run --extra aizyn scripts/aizynthfinder/2-prepare-stock.py \
    --files data/models/assets/retrocast-bb-stock-v3-canon.csv \
    --source plain \
    --output data/models/assets/retrocast-bb-stock-v3.hdf5 \
    --target hdf5

# 3. Run predictions
uv run --extra aizyn scripts/aizynthfinder/3-run-aizyn-mcts.py --target-name "uspto-190"

Example: DirectMultiStep

# 1. Download model checkpoint
bash scripts/directmultistep/1-download-assets.sh

# 2. Run predictions
uv run --extra dms --extra torch-gpu scripts/directmultistep/2-run-dms.py \
    --model-name "explorer-xl" \
    --use-fp16 \
    --target-name "uspto-190"

Each model follows the same pattern: numbered scripts in scripts/<model-name>/. UV automatically handles conflicting dependencies (PyTorch versions, NumPy pinning, etc.) via optional dependency groups.

Convert to Unified Format

Once you have raw model outputs, convert them to the canonical RetroCast format:

# Process a single model run
uv run scripts/process-predictions.py process --model aizynthfinder-mcts --dataset uspto-190

# List available models
uv run scripts/process-predictions.py list

# Show configuration for a specific model
uv run scripts/process-predictions.py info --model directmultistep

This will:

Validate the raw output using model-specific schemas
Transform it via the appropriate adapter to Route objects
Deduplicate routes
Save canonical output with a deterministic hash

Use as a Python Library

You can also use RetroCast programmatically to adapt individual routes from any supported model:

from retrocast import adapt_single_route, TargetIdentity

# Define your target
target = TargetIdentity(id="aspirin", smiles="CC(=O)Oc1ccccc1C(=O)O")

# Your model's raw prediction (e.g., DMS format)
raw_route = {
    "smiles": "CC(=O)Oc1ccccc1C(=O)O",
    "children": [
        {"smiles": "Oc1ccccc1C(=O)O", "children": []},
        {"smiles": "CC(=O)Cl", "children": []}
    ]
}

# Adapt to unified format - works with both route-centric (DMS, AiZynth)
# and target-centric (RetroChimera, ASKCOS) adapter formats
route = adapt_single_route(raw_route, target, adapter_name="dms")

if route:
    print(f"Route depth: {route.length}")
    print(f"Starting materials: {len(route.leaves)}")

See docs/api_usage.md for complete API documentation and examples.

Available Models

Adapters are implemented and tested for:

AiZynthFinder (MCTS, Retro*)
Retro* (original implementation)
DirectMultiStep (Flash, Explorer variants)
SynPlanner
Syntheseus (BFS, Retro-0)
ASKCOS
RetroChimera
DreamRetro
MultiStepTTL
SynLlama
PARoutes

See retrocast-config.yaml for full configuration details.

Evaluation Sets

We provide curated subsets of the PaRoutes benchmark:

uspto-190: Full USPTO test set (190 targets)
paroutes-n1-{100,200,500,1k,2k}: Stratified subsets of the n=1 test set
paroutes-n5-{100,200,500,1k,2k}: Stratified subsets of the n=5 test set

Each subset is:

Hashed for reproducibility
Balanced across route lengths and complexities
Small enough for rapid iteration (100 targets ~10min vs 10k targets ~10hrs)

Subsets are selected such that top-k accuracy on the subset is within 0.05-1% of the full set, depending on size.

The Canonical Format

At the core of RetroCast is a clean recursive schema (src/retrocast/schemas.py):

class Molecule(BaseModel):
    smiles: SmilesStr
    inchikey: InchiKeyStr
    synthesis_step: ReactionStep | None  # None = leaf (starting material)
    metadata: dict[str, Any]

class ReactionStep(BaseModel):
    reactants: list[Molecule]
    mapped_smiles: ReactionSmilesStr | None
    template: str | None
    reagents: list[SmilesStr] | None
    solvents: list[SmilesStr] | None
    metadata: dict[str, Any]

class Route(BaseModel):
    target: Molecule
    rank: int
    solvability: dict[str, bool]  # per building block set
    metadata: dict[str, Any]

Every route from every model gets cast into this structure. No ambiguity, no special cases.

Architecture

RetroCast is built on three principles:

Adapters are the air gap - All model-specific logic is isolated in pluggable adapters. The core pipeline never touches raw formats directly.
Contracts, not handshakes - Pydantic schemas enforce validation at every boundary. Invalid data is rejected immediately.
Deterministic & auditable - Every output is identified by a cryptographic hash of its inputs. Results are reproducible and traceable.

The pipeline:

load raw data → adapter → Route → deduplicate → save + manifest

See docs/adapters.md for details on adding new adapters.

Citation

If you use RetroCast in your research, please cite:

# ArXiv citation - TODO: add link

License

MIT License - see LICENSE for details.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ischemist

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.5.3

Jan 17, 2026

0.5.2

Jan 4, 2026

0.5.1

Dec 9, 2025

0.4.0

Nov 29, 2025

0.3.1

Nov 22, 2025

This version

0.3.0

Nov 21, 2025

0.2.0

Nov 15, 2025

0.1.0

Nov 15, 2025

Dec 1, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

retrocast-0.3.0.tar.gz (609.5 kB view details)

Uploaded Nov 21, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

retrocast-0.3.0-py3-none-any.whl (97.2 kB view details)

Uploaded Nov 21, 2025 Python 3

File details

Details for the file retrocast-0.3.0.tar.gz.

File metadata

Download URL: retrocast-0.3.0.tar.gz
Upload date: Nov 21, 2025
Size: 609.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for retrocast-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`01f7d08f046151e220cc23c728af48978aed3fd1f6e11e7112a86df53a49b629`
MD5	`f1022076c09b7818cb4082c5f7657e0c`
BLAKE2b-256	`d68598182eb2d90104bb1e6954fb8c342d4eef754dfc50fcb48ace29203b4d47`

See more details on using hashes here.

File details

Details for the file retrocast-0.3.0-py3-none-any.whl.

File metadata

Download URL: retrocast-0.3.0-py3-none-any.whl
Upload date: Nov 21, 2025
Size: 97.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for retrocast-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`05ea68977eaef7f4b3dea5b07d78ddd419ed2fb4d86a1eaca46b00f7bce1ef03`
MD5	`67c4a354d13152b08956d61ae2089555`
BLAKE2b-256	`7f371c8a7db0049f280837c59641a70d29d9a6ead6bf00b617bf1238a354adc1`

See more details on using hashes here.

retrocast 0.3.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

RetroCast: A Unified Format for Multistep Retrosynthesis

The Problem

The Solution

Quick Start

Install

Run Any Model in Three Commands

Convert to Unified Format

Use as a Python Library

Available Models

Evaluation Sets

The Canonical Format

Architecture

Citation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes