Skip to main content

Retrosynthesis route finder — AiZynthFinder + Rxn-INSIGHT + Chemistry by Design

Project description

Project Logo Coverage Status

path_finder

Retrosynthesis route finder — AiZynthFinder · Rxn-INSIGHT · Chemistry by Design

Yara Chahda · Corentin Portmann · Inès Ouchen Laksiri — EPFL 2026


User installation

1. Install RDKit

RDKit cannot be installed via pip — conda is required for this one step.

conda install -c conda-forge rdkit

2. Install Path Finder

pip install path-finder-retrosynthesis

3. Run the setup wizard

path-finder-setup

This automatically:

  • copies the bundled datasets into data/
  • downloads the AiZynthFinder model files (~500 MB) via the official AiZynthFinder downloader
  • generates data/config.yml with the correct paths

If the automatic download fails, download the model files manually from https://github.com/MolecularAI/aizynthfinder/releases and place them in data/aizynthfinder/.

4. Download the Rxn-INSIGHT USPTO database

Download uspto_rxn_insight.gzip from: The rxn-INSIGHT article

Place it in data/uspto_rxn_insight.gzip.

This file enables reaction condition prediction for novel routes (predicted routes section). Without it, only dataset and validated routes are shown.

5. Launch

path-finder

Open http://localhost:8501 in your browser.


Summary

conda install -c conda-forge rdkit
pip install path-finder-retrosynthesis
path-finder-setup
# → place uspto_rxn_insight.gzip in data/
path-finder

What the app does

Path Finder finds and ranks retrosynthesis routes for a target molecule using three sources:

Section Source Conditions Yield in scoring
📚 Dataset Curated Chemistry by Design routes Real Yes
✅ Validated AiZynthFinder + generic reactions (USPTO) Real Yes
🤖 Predicted AiZynthFinder + Rxn-INSIGHT Predicted No

Routes are scored using a weighted 1/i² scheme across three user-chosen criteria: steps, yield, atom economy, E-factor, or safety.


Data files

File Bundled Description
reaction_dataset.json Curated synthesis routes
toxicity_dataset.json Safety scores for reagents and solvents
generic_reactions.json 10 000 USPTO reactions for step validation
data/aizynthfinder/ AiZynthFinder model files — downloaded by wizard
data/config.yml Generated by wizard — do not commit
data/uspto_rxn_insight.gzip Rxn-INSIGHT USPTO database — download manually

Troubleshooting

Problem Solution
config.yml not found Run path-finder-setup
AiZynthFinder crash Check that all paths in data/config.yml are absolute
No routes found Try Galanthamine (OC1C=C[C@@]23c4cc(OC)ccc4CN(C)C[C@@H]2[C@@H]1O3)
Predicted routes disabled Add data/uspto_rxn_insight.gzip (see step 4 above)
Slow search (~2 min) Normal — AiZynthFinder MCTS is computationally intensive

Developer setup

git clone https://github.com/YaraChahda/path_finder.git
cd path_finder
conda install -c conda-forge rdkit
pip install -e .
path-finder-setup
path-finder

Running tests

pytest tests/

Publishing a new version

sed -i '' 's/version = "X.Y.Z"/version = "X.Y.Z+1"/' pyproject.toml
git add pyproject.toml
git commit -m "release: vX.Y.Z+1"
git tag vX.Y.Z+1
git push origin clone_optimise_app_clean --tags
# GitHub Actions publishes to PyPI automatically

Repository structure

This section describes the purpose of every file and folder so that new contributors can orient themselves quickly.

Root-level files

File Purpose
pyproject.toml Package metadata, dependencies, and entry points for pip install
path_finder-env.yml Conda environment — use this to recreate the full dev environment
README.md This file
LICENSE MIT licence
mypy.ini Type-checking configuration — ignores RDKit and Rxn-INSIGHT stubs
.gitignore Files excluded from git (config.yml, model files, pycache, dist/)
.pre-commit-config.yaml Pre-commit hooks — checks for large files and merge conflicts
.readthedocs.yml Automatic documentation build on readthedocs.org
tox.ini Test automation configuration

.github/workflows/

File Purpose
publish.yml Automatically builds and uploads to PyPI when a git tag is pushed

assets/

File Purpose
banner.png Project banner displayed at the top of this README
coverage-badge.svg Test coverage badge auto-generated by the CI pipeline

data/

Working data directory — none of these files are committed to git.

File / Folder Purpose
aizynthfinder/ AiZynthFinder model files downloaded by path-finder-setup
config.yml AiZynthFinder config created by path-finder-setup — contains absolute paths specific to each machine
reaction_dataset.json Main curated dataset (also bundled in the pip package)
toxicity_dataset.json Safety scores (also bundled in the pip package)
generic_reactions.json 10 000 USPTO reactions for step validation (also bundled)
uspto_rxn_insight.gzip Rxn-INSIGHT USPTO database — download manually, enables predicted routes

docs/

Folder / File Purpose
source/conf.py Sphinx documentation configuration
source/index.md Documentation home page
source/api/ Auto-generated API reference pages
docs_out/ Generated Sphinx HTML output — do not edit manually, rebuilt with make html

notebooks/

File Purpose
report.ipynb Project report — introduction, methods, results, discussion, and live code demonstrations
screenshots/ Screenshots of the Streamlit interface used as figures in the report

scripts/

File Purpose
convert_orderly_to_generic.py Converts the ORDerly-cleaned USPTO Parquet file into generic_reactions.json — run once to rebuild the generic dataset from scratch

src/path_finder/

The installable Python package. All application logic lives here.

File Purpose
__init__.py Package entry point — exposes find_best_routes and __version__
_about_.py Single source of truth for the version number
app.py Streamlit front-end — tab layout, sidebar, route cards, search logic
app_utensils.py UI helper functions used by app.py — score tables, scheme HTML, chart builders
route_engine.py Chemistry backend — dataset loading, AiZynthFinder, Rxn-INSIGHT, scoring, and the main find_best_routes() entry point
molecule_rendering.py RDKit Cairo rendering — mol_png() for Streamlit images, mol_b64_or_text_svg() for embedded HTML schemes
localization.py All user-visible strings in English and French, plus the colour palette
report_builder.py PDF generation using PIL — multi-page A4 report with metric cards, score table, and reaction step images
launch.py Console entry points — path-finder (launches Streamlit) and path-finder-setup (setup wizard)
assets/banner.png Banner bundled in the pip package
data/ Datasets and config template bundled in the pip package, copied to data/ on first run

tests/

File Purpose
conftest.py Shared pytest fixtures (sample routes, mock datasets)
test_route_engine.py Tests for scoring functions, dataset loading, SMILES canonicalisation, and route ranking
test_molecule_rendering.py Tests for PNG rendering, base64 encoding, and fallback data URIs
test_localization.py Tests that all required keys exist in both EN and FR dictionaries
test_report_builder.py Tests for PDF generation
test_app_utensils.py Tests for UI helper functions
test_app.py Integration tests for the Streamlit app
test_launch.py Tests for the CLI entry points

Citation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

path_finder_retrosynthesis-1.0.5.tar.gz (876.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

path_finder_retrosynthesis-1.0.5-py3-none-any.whl (882.7 kB view details)

Uploaded Python 3

File details

Details for the file path_finder_retrosynthesis-1.0.5.tar.gz.

File metadata

File hashes

Hashes for path_finder_retrosynthesis-1.0.5.tar.gz
Algorithm Hash digest
SHA256 1cd1ac2308a35f825c0c8b150c95597ad230a606096a8a6fbe73c6460450da1d
MD5 4081295eeeea4336d7dac9398a9345c6
BLAKE2b-256 833ec03088fc6ef60e080959726d4cd2daeabe21db7f59e33a09096b633ce8cb

See more details on using hashes here.

File details

Details for the file path_finder_retrosynthesis-1.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for path_finder_retrosynthesis-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 04282d846c273bbfad2f92628a6985c1bbecf08ae32106d9b01a81a486649af5
MD5 decd1b36c2554e4d73c7727c20f1daf9
BLAKE2b-256 10566a8831000a8ecf44c806ff11f34d32ce9fc67578758548837cc328339584

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page