Retrosynthesis route finder — AiZynthFinder + Rxn-INSIGHT + Chemistry by Design
Project description
path_finder
Retrosynthesis route finder — AiZynthFinder · Rxn-INSIGHT · Chemistry by Design
Yara Chahda · Corentin Portmann · Inès Ouchen Laksiri — EPFL 2026
User installation
1. Install RDKit
RDKit cannot be installed via pip — conda is required for this one step.
conda install -c conda-forge rdkit
2. Install Path Finder
pip install path-finder-retrosynthesis
3. Run the setup wizard
path-finder-setup
This automatically:
- copies the bundled datasets into
data/ - downloads the AiZynthFinder model files (~500 MB) via the official AiZynthFinder downloader
- generates
data/config.ymlwith the correct paths
If the automatic download fails, download the model files manually from https://github.com/MolecularAI/aizynthfinder/releases and place them in
data/aizynthfinder/.
4. Download the Rxn-INSIGHT USPTO database
Download uspto_rxn_insight.gzip from:
The rxn-INSIGHT article
Place it in data/uspto_rxn_insight.gzip.
This file enables reaction condition prediction for novel routes (predicted routes section). Without it, only dataset and validated routes are shown.
5. Launch
path-finder
Open http://localhost:8501 in your browser.
Summary
conda install -c conda-forge rdkit
pip install path-finder-retrosynthesis
path-finder-setup
# → place uspto_rxn_insight.gzip in data/
path-finder
What the app does
Path Finder finds and ranks retrosynthesis routes for a target molecule using three sources:
| Section | Source | Conditions | Yield in scoring |
|---|---|---|---|
| 📚 Dataset | Curated Chemistry by Design routes | Real | Yes |
| ✅ Validated | AiZynthFinder + generic reactions (USPTO) | Real | Yes |
| 🤖 Predicted | AiZynthFinder + Rxn-INSIGHT | Predicted | No |
Routes are scored using a weighted 1/i² scheme across three user-chosen criteria: steps, yield, atom economy, E-factor, or safety.
Data files
| File | Bundled | Description |
|---|---|---|
reaction_dataset.json |
✅ | Curated synthesis routes |
toxicity_dataset.json |
✅ | Safety scores for reagents and solvents |
generic_reactions.json |
✅ | 10 000 USPTO reactions for step validation |
data/aizynthfinder/ |
❌ | AiZynthFinder model files — downloaded by wizard |
data/config.yml |
❌ | Generated by wizard — do not commit |
data/uspto_rxn_insight.gzip |
❌ | Rxn-INSIGHT USPTO database — download manually |
Troubleshooting
| Problem | Solution |
|---|---|
config.yml not found |
Run path-finder-setup |
| AiZynthFinder crash | Check that all paths in data/config.yml are absolute |
| No routes found | Try Galanthamine (OC1C=C[C@@]23c4cc(OC)ccc4CN(C)C[C@@H]2[C@@H]1O3) |
| Predicted routes disabled | Add data/uspto_rxn_insight.gzip (see step 4 above) |
| Slow search (~2 min) | Normal — AiZynthFinder MCTS is computationally intensive |
Developer setup
git clone https://github.com/YaraChahda/path_finder.git
cd path_finder
conda install -c conda-forge rdkit
pip install -e .
path-finder-setup
path-finder
Running tests
pytest tests/
Publishing a new version
sed -i '' 's/version = "X.Y.Z"/version = "X.Y.Z+1"/' pyproject.toml
git add pyproject.toml
git commit -m "release: vX.Y.Z+1"
git tag vX.Y.Z+1
git push origin clone_optimise_app_clean --tags
# GitHub Actions publishes to PyPI automatically
Repository structure
This section describes the purpose of every file and folder so that new contributors can orient themselves quickly.
Root-level files
| File | Purpose |
|---|---|
pyproject.toml |
Package metadata, dependencies, and entry points for pip install |
path_finder-env.yml |
Conda environment — use this to recreate the full dev environment |
README.md |
This file |
LICENSE |
MIT licence |
mypy.ini |
Type-checking configuration — ignores RDKit and Rxn-INSIGHT stubs |
.gitignore |
Files excluded from git (config.yml, model files, pycache, dist/) |
.pre-commit-config.yaml |
Pre-commit hooks — checks for large files and merge conflicts |
.readthedocs.yml |
Automatic documentation build on readthedocs.org |
tox.ini |
Test automation configuration |
.github/workflows/
| File | Purpose |
|---|---|
publish.yml |
Automatically builds and uploads to PyPI when a git tag is pushed |
assets/
| File | Purpose |
|---|---|
banner.png |
Project banner displayed at the top of this README |
coverage-badge.svg |
Test coverage badge auto-generated by the CI pipeline |
data/
Working data directory — none of these files are committed to git.
| File / Folder | Purpose |
|---|---|
aizynthfinder/ |
AiZynthFinder model files downloaded by path-finder-setup |
config.yml |
AiZynthFinder config created by path-finder-setup — contains absolute paths specific to each machine |
reaction_dataset.json |
Main curated dataset (also bundled in the pip package) |
toxicity_dataset.json |
Safety scores (also bundled in the pip package) |
generic_reactions.json |
10 000 USPTO reactions for step validation (also bundled) |
uspto_rxn_insight.gzip |
Rxn-INSIGHT USPTO database — download manually, enables predicted routes |
docs/
| Folder / File | Purpose |
|---|---|
source/conf.py |
Sphinx documentation configuration |
source/index.md |
Documentation home page |
source/api/ |
Auto-generated API reference pages |
docs_out/ |
Generated Sphinx HTML output — do not edit manually, rebuilt with make html |
notebooks/
| File | Purpose |
|---|---|
report.ipynb |
Project report — introduction, methods, results, discussion, and live code demonstrations |
screenshots/ |
Screenshots of the Streamlit interface used as figures in the report |
scripts/
| File | Purpose |
|---|---|
convert_orderly_to_generic.py |
Converts the ORDerly-cleaned USPTO Parquet file into generic_reactions.json — run once to rebuild the generic dataset from scratch |
src/path_finder/
The installable Python package. All application logic lives here.
| File | Purpose |
|---|---|
__init__.py |
Package entry point — exposes find_best_routes and __version__ |
_about_.py |
Single source of truth for the version number |
app.py |
Streamlit front-end — tab layout, sidebar, route cards, search logic |
app_utensils.py |
UI helper functions used by app.py — score tables, scheme HTML, chart builders |
route_engine.py |
Chemistry backend — dataset loading, AiZynthFinder, Rxn-INSIGHT, scoring, and the main find_best_routes() entry point |
molecule_rendering.py |
RDKit Cairo rendering — mol_png() for Streamlit images, mol_b64_or_text_svg() for embedded HTML schemes |
localization.py |
All user-visible strings in English and French, plus the colour palette |
report_builder.py |
PDF generation using PIL — multi-page A4 report with metric cards, score table, and reaction step images |
launch.py |
Console entry points — path-finder (launches Streamlit) and path-finder-setup (setup wizard) |
assets/banner.png |
Banner bundled in the pip package |
data/ |
Datasets and config template bundled in the pip package, copied to data/ on first run |
tests/
| File | Purpose |
|---|---|
conftest.py |
Shared pytest fixtures (sample routes, mock datasets) |
test_route_engine.py |
Tests for scoring functions, dataset loading, SMILES canonicalisation, and route ranking |
test_molecule_rendering.py |
Tests for PNG rendering, base64 encoding, and fallback data URIs |
test_localization.py |
Tests that all required keys exist in both EN and FR dictionaries |
test_report_builder.py |
Tests for PDF generation |
test_app_utensils.py |
Tests for UI helper functions |
test_app.py |
Integration tests for the Streamlit app |
test_launch.py |
Tests for the CLI entry points |
Citation
- AiZynthFinder: Genheden et al., J. Cheminf. 2020 — doi:10.1186/s13321-020-00472-1
- Rxn-INSIGHT: Thakkar et al., J. Cheminf. 2023 — doi:10.1186/s13321-023-00744-4
- Open Reaction Database: Kearnes et al., JACS 2021 — doi:10.1021/jacs.1c09820
- Chemistry by Design: chemistrybydesign.oia.arizona.edu
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file path_finder_retrosynthesis-1.0.4.tar.gz.
File metadata
- Download URL: path_finder_retrosynthesis-1.0.4.tar.gz
- Upload date:
- Size: 882.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0cc0391a97314aae0eac91ea73f0db2dfd0610570892595d34066331cd456ea3
|
|
| MD5 |
2cc5cd21d8eb37bbe7357bc2583a5db8
|
|
| BLAKE2b-256 |
be8e33654d28bfad1f6ea10065f83cea46255fe1e252426a77fcd4bdfdf2326e
|
File details
Details for the file path_finder_retrosynthesis-1.0.4-py3-none-any.whl.
File metadata
- Download URL: path_finder_retrosynthesis-1.0.4-py3-none-any.whl
- Upload date:
- Size: 885.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b943fe8e23ee42db497550518ec8505a48437da06edf50e5a0e3278ad36c5638
|
|
| MD5 |
44a386138a403a9f8ea019010540a7c4
|
|
| BLAKE2b-256 |
7a2048051e3273c56227303cfc5a56465f12185276fc60107e3e2d6f794d4f56
|