Reaction fingerprint
Project description
SynRFP
SynRFP (Synthesis Reaction FingerPrint) is a mapping-free, graph-invariant fingerprinting framework for chemical reactions. It represents transformations by:
-
Extracting local graph tokens
- Weisfeiler–Lehman (WL) subtree hashes
- Canonical ego-subgraph hashes (via
pynauty)
-
Computing a signed multiset difference
- Δ = tokens(product) − tokens(reactant)
-
Compressing into compact sketches
- ParityFold: binary parity-fold into B bits
- MinHashSketch: classical MinHash with m permutations
- CWSketch: weighted MinHash for signed deltas
This approach requires no atom-mapping or reactant/reagent distinction, is permutation-invariant, and scales linearly with graph size.
📁 Repository Structure
synrfp/
├── __init__.py # package exports & version
├── synrfp.py # core driver: convenience builders & similarity functions, rsmi_to_fingerprint
├── encoder.py # SynRFPEncoder: batch‐encode RSMI list → 2D bit arrays
├── graph/
│ ├── __init__.py
│ ├── graph_data.py # GraphData container & utilities
│ └── reaction.py # Reaction.from_rsmi / from_graph, Reaction collection API
├── tokenizers/
│ ├── __init__.py
│ ├── base.py # BaseTokenizer interface
│ ├── utils.py # _h64, atom_label_tuple, bond_label_tuple helpers
│ ├── wl.py # WLTokenizer implementation
│ └── nauty.py # NautyTokenizer implementation
└── sketchers/
├── __init__.py
├── base.py # BaseSketch & WeightedSketch interfaces
├── parity_fold.py # ParityFold sketcher
├── minhash_sketch.py # MinHashSketch sketcher
└── cw_sketch.py # CWSketch sketcher
⚙️ Installation
# 1) Clone the repository
git clone https://github.com/TieuLongPhan/synrfp.git
cd synrfp
# 2) Install the package (with optional extras)
pip install . # core functionality
pip install .[all] # with datasketch and pynauty support
or can install via pip
pip install synrfp
🔧 Quick Start
1. Single‐reaction fingerprint
from synrfp.graph.reaction import Reaction
from synrfp import SynRFP
from synrfp.tokenizers.wl import WLTokenizer
from synrfp.sketchers.parity_fold import ParityFold
# Parse RSMI into GraphData
reactant_G, product_G = Reaction.from_rsmi("CCO>>C=C.O")
# Build engine: WL at radius 1 + 1024-bit parity-fold
fp_engine = SynRFP(
tokenizer=WLTokenizer(),
radius=1,
sketch=ParityFold(bits=1024, seed=42),
)
# Compute fingerprint
res = fp_engine.fingerprint(reactant_G, product_G)
print(res) # SynRFPResult(tokens_R=3 tokens, tokens_P=3 tokens, support=0, sketch_type=bytearray)
bits = res.to_binary() # [0,1,0,0, …]
2. One‐line wrapper
from synrfp import synrfp
# Generate a 1024-bit binary fingerprint in one call
bits = synrfp(
"CCO>>C=C.O",
tokenizer="wl",
radius=1,
sketch="parity",
bits=1024,
seed=42,
)
print(len(bits), bits[:16]) # e.g. 1024 [0, 1, 0, 0, …]
3. Batch encoding
from synrfp.encoder import SynRFPEncoder
rxn_smiles = [
"CO.O[C@@H]1CCNC1.[C-]#[N+]CC(=O)OC>>[C-]#[N+]CC(=O)N1CC[C@@H](O)C1",
"CCOC(=O)C(CC)c1cccnc1.Cl.O>>CCC(C(=O)O)c1cccnc1",
]
# Encode two reactions into a 2×1024 array of bits
fps = SynRFPEncoder.encode(
rxn_smiles,
tokenizer="wl",
radius=1,
sketch="parity",
bits=1024,
seed=42,
)
print(fps.shape) # (2, 1024)
print(fps[0][:16]) # first 16 bits of the first fingerprint
Contributing
License
This project is licensed under MIT License - see the License file for details.
Acknowledgments
This project has received funding from the European Unions Horizon Europe Doctoral Network programme under the Marie-Skłodowska-Curie grant agreement No 101072930 (TACsy -- Training Alliance for Computational)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file synrfp-0.0.1.tar.gz.
File metadata
- Download URL: synrfp-0.0.1.tar.gz
- Upload date:
- Size: 1.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a037195f6d1c1427456cc2093fbb6609352319450d18bbc5ad2ce54321f0b74d
|
|
| MD5 |
f5586278033b252fc1e9c7ab8a717165
|
|
| BLAKE2b-256 |
d688dbb79a1b0e1112effac35eb1f8d31d47b656edcf22d35677e3c0b294cfd0
|
Provenance
The following attestation bundles were made for synrfp-0.0.1.tar.gz:
Publisher:
publish-package.yml on TieuLongPhan/SynRFP
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
synrfp-0.0.1.tar.gz -
Subject digest:
a037195f6d1c1427456cc2093fbb6609352319450d18bbc5ad2ce54321f0b74d - Sigstore transparency entry: 685748525
- Sigstore integration time:
-
Permalink:
TieuLongPhan/SynRFP@6779541503b16ff8fc548ec64a8af0d436789a60 -
Branch / Tag:
refs/tags/v0.0.1 - Owner: https://github.com/TieuLongPhan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-package.yml@6779541503b16ff8fc548ec64a8af0d436789a60 -
Trigger Event:
release
-
Statement type:
File details
Details for the file synrfp-0.0.1-py3-none-any.whl.
File metadata
- Download URL: synrfp-0.0.1-py3-none-any.whl
- Upload date:
- Size: 26.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c2413293952c40e9241c3d555088798b64e5993e2a163cc808ba220372996fa4
|
|
| MD5 |
f7f1da930c05ed020b7c62096bd968e9
|
|
| BLAKE2b-256 |
c040d1ed796d4b9a3554fedfe20d8bb8e8e61113824c935b64e758f24e0485d6
|
Provenance
The following attestation bundles were made for synrfp-0.0.1-py3-none-any.whl:
Publisher:
publish-package.yml on TieuLongPhan/SynRFP
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
synrfp-0.0.1-py3-none-any.whl -
Subject digest:
c2413293952c40e9241c3d555088798b64e5993e2a163cc808ba220372996fa4 - Sigstore transparency entry: 685748537
- Sigstore integration time:
-
Permalink:
TieuLongPhan/SynRFP@6779541503b16ff8fc548ec64a8af0d436789a60 -
Branch / Tag:
refs/tags/v0.0.1 - Owner: https://github.com/TieuLongPhan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-package.yml@6779541503b16ff8fc548ec64a8af0d436789a60 -
Trigger Event:
release
-
Statement type: