Minimal standalone RDKit synthon-OR search.
Project description
synthonor
Opensource synthon similarity search with a bitwise OR strategy and support of generic fingerprints.
synthonor supports a simple workflow:
- build packed synthon fingerprints once per TSV + fingerprint setting
- memory-map the packed cache when searching
- reuse a valid cache automatically on later runs
- search with
load_synthon_or_index(...),search_smiles(...), andsearch_fingerprint(...)
Install
pip install synthonor
Database Availability
SynthonOR expects a tab-separated table with these concepts:
smiles synthon_id position reaction_id
NC(=O)[C@@H]1CCCN1[U] 100000003125 1 11a
C[C@@H](O)[C@H](N[U])C(N)=O 100000003557 1 11a
CCCN([U])C(C)C(=O)Nc1ccccc1C 100000003669 1 11a
O=C1CN([U])[C@@H](c2ccccc2)CO1 100000005368 1 11a
The package ships with a bundled example synthon slice (synthon_space_1M.tsv).
The repo also includes the matching reaction schema table used by the exact
benchmark script.
Quick Start (Python)
from synthonor import (
build_synthon_fingerprint_cache,
example_space_path,
load_synthon_or_index,
search_smiles,
)
data_path = example_space_path()
cache_info = build_synthon_fingerprint_cache(data_path)
index = load_synthon_or_index(data_path)
hits = search_smiles(
"CCOc1ccc(NC(=O)N2CCN(CC2)C)cc1",
index,
top_n=25,
)
print(cache_info.cache_prefix)
print(hits[0].reaction_id, hits[0].synthon_ids, round(hits[0].approx_score, 3))
example_space_path() returns a normal writable local path to the bundled
example TSV, so the first cache build can live right next to it. The default
fingerprint is ecfp4.
Fingerprint-based search:
from synthonor import query_fingerprint_from_smiles, search_fingerprint
query_fp = query_fingerprint_from_smiles("CCN1CCN(CC1)C(=O)c1ccccc1", index.fingerprint_spec)
hits = search_fingerprint(query_fp, index, min_score=0.35, preset="very_accurate")
CLI
Build or validate cache only:
synthonor path/to/syntons.tsv \
--fingerprint ecfp4 \
--build-cache-only
Run search:
synthonor path/to/syntons.tsv \
--query "CCOc1ccc(NC(=O)N2CCN(CC2)C)cc1" \
--top-n 25 \
--output synthonor_hits.jsonl
Run explicit self-test mode:
synthonor path/to/syntons.tsv --test --preset fast --top-n 5
Search Contract
top_n=N: return at mostNhits, sorted by descending approximate score.min_score=S: return every hit with approximate score>= S.min_score=S, top_n=N: apply score cutoff first, then cap toN.max_score=T: optionally bound score from above.- returned
rankvalues are ranks within the filtered output.
Config precedence:
- use
preset="fast" | "accurate" | "very_accurate"for standard workflows - pass
config=SearchConfig(...)for explicit control - explicit
configoverrides preset defaults - explicit
top_noverridesconfig.topk_products
Search Presets
fast: default setting; up to8reaction routes,64candidates per slot,50kexhaustive tuple limitaccurate: searches all prescreened reactions with192candidates per slot and a250kexhaustive tuple limitvery_accurate: same route coverage asaccurate, with256candidates per slot and a500kexhaustive tuple limit
Fingerprints
Packed on-disk synthon caches are used for bit fingerprint families:
ecfp4ecfp6rdkitpatternfpatom_pairtopological_torsion
Package Contents
After pip install synthonor, installed artifacts include:
- Python package code under
synthonor - bundled example TSV exposed via
synthonor.example_space_path(), which materializes a writable local copy - bundled benchmark reaction schema table under
synthonor.data
Repo-only artifacts that are not installed by default:
- local cache files you generate such as
*.synthon_fp_cache.* - notebooks in
notebooks/ - local test/result outputs
Benchmark Snapshot
Headline results below come from the exact full-product benchmark on the
bundled synthon_space_1M.tsv slice (6273 synthons, 42 reactions), using
the matching bundled reaction schema table and 10 deterministic queries.
| fingerprint | fast overlap | fast wall time / query (s) | accurate overlap | accurate wall time / query (s) |
|---|---|---|---|---|
ecfp4 |
56.2 |
0.963 |
63.9 |
7.337 |
ecfp6 |
49.7 |
0.925 |
56.6 |
7.376 |
topological_torsion |
30.2 |
0.874 |
33.4 |
7.455 |
rdkit |
20.9 |
1.110 |
25.4 |
7.520 |
atom_pair |
9.8 |
1.202 |
13.8 |
7.572 |
patternfp |
2.2 |
1.215 |
2.2 |
7.509 |
fastis now the default because it captures most of the retrieval quality at roughly1 s/queryon this bundled example.accurateimproves overlap further, but costs about7.3-7.6 s/query.ecfp4remains the strongest overall default fingerprint on this benchmark.
Reproducible Scripts
From the repo root you can materialize the bundled synthon caches for every bit-fingerprint family:
python scripts/build_example_fingerprint_caches.py
And you can run the exact full-product retrieval benchmark on the bundled
synthon_space_1M slice without relying on sibling repos:
python scripts/run_exact_full_product_retrieval.py
Outputs land under results/ by default.
Layout
src/synthonor/fingerprints.py: fingerprint and similarity helperssrc/synthonor/synthon_or_rdkit.py: cache build/load, index loading, search implementationsrc/synthonor/resources.py: bundled data helpers (example_space_path)notebooks/001_minimal_implementation.py: minimal end-to-end implementationnotebooks/002_basic_usage.py: bundled database workflownotebooks/003_cli_quickstart.py: command-line quickstartnotebooks/004_adding_databases.py: preparing custom TSV databasesnotebooks/005_different_fingerprints.py: comparing bit fingerprint families
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file synthonor-0.2.0.tar.gz.
File metadata
- Download URL: synthonor-0.2.0.tar.gz
- Upload date:
- Size: 76.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7020207a260e3e7b360c00c38573e4da08fdc43839cf3e9ec9f84557479f3ccf
|
|
| MD5 |
87dad9cbdaca649813b932413ca3fd8c
|
|
| BLAKE2b-256 |
0a6d72d42e72d7be63191c7bb588b89dafc933aef7b5f7edaeaff4b9a58a1ec9
|
File details
Details for the file synthonor-0.2.0-py3-none-any.whl.
File metadata
- Download URL: synthonor-0.2.0-py3-none-any.whl
- Upload date:
- Size: 75.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f10101b1b3ef8a5606f692055520213ba078d49dba17c720e2198b3ef49ffee5
|
|
| MD5 |
d4700c2794aada2510a4b84730690401
|
|
| BLAKE2b-256 |
23d3d5ba0040f50517e9169abdd17e6fc341db4b42af06b20cf46ed8e7032ed2
|