Skip to main content

Reaction fingerprint

Project description

SynRFP

SynRFP (Synthesis Reaction FingerPrint) is a mapping-free, graph-invariant fingerprinting framework for chemical reactions. It represents transformations by:

  1. Extracting local graph tokens

    • Weisfeiler–Lehman (WL) subtree hashes
    • Canonical ego-subgraph hashes (via pynauty)
  2. Computing a signed multiset difference

    • Δ = tokens(product) − tokens(reactant)
  3. Compressing into compact sketches

    • ParityFold: binary parity-fold into B bits
    • MinHashSketch: classical MinHash with m permutations
    • CWSketch: weighted MinHash for signed deltas

This approach requires no atom-mapping or reactant/reagent distinction, is permutation-invariant, and scales linearly with graph size.

SynRFP Workflow


📁 Repository Structure

synrfp/
├── __init__.py           # package exports & version
├── synrfp.py             # core driver: convenience builders & similarity functions, rsmi_to_fingerprint
├── encoder.py            # SynRFPEncoder: batch‐encode RSMI list → 2D bit arrays
├── graph/
│   ├── __init__.py
│   ├── graph_data.py     # GraphData container & utilities   └── reaction.py       # Reaction.from_rsmi / from_graph, Reaction collection API
├── tokenizers/
│   ├── __init__.py
│   ├── base.py           # BaseTokenizer interface   ├── utils.py          # _h64, atom_label_tuple, bond_label_tuple helpers   ├── wl.py             # WLTokenizer implementation   └── nauty.py          # NautyTokenizer implementation
└── sketchers/
    ├── __init__.py
    ├── base.py           # BaseSketch & WeightedSketch interfaces
    ├── parity_fold.py    # ParityFold sketcher
    ├── minhash_sketch.py # MinHashSketch sketcher
    └── cw_sketch.py      # CWSketch sketcher

⚙️ Installation

# 1) Clone the repository
git clone https://github.com/TieuLongPhan/synrfp.git
cd synrfp

# 2) Install the package (with optional extras)
pip install .                  # core functionality
pip install .[all]             # with datasketch and pynauty support

or can install via pip

pip install synrfp

🔧 Quick Start

1. Single‐reaction fingerprint

from synrfp.graph.reaction import Reaction
from synrfp import SynRFP
from synrfp.tokenizers.wl import WLTokenizer
from synrfp.sketchers.parity_fold import ParityFold

# Parse RSMI into GraphData
reactant_G, product_G = Reaction.from_rsmi("CCO>>C=C.O")

# Build engine: WL at radius 1 + 1024-bit parity-fold
fp_engine = SynRFP(
    tokenizer=WLTokenizer(),
    radius=1,
    sketch=ParityFold(bits=1024, seed=42),
)

# Compute fingerprint
res = fp_engine.fingerprint(reactant_G, product_G)
print(res)               # SynRFPResult(tokens_R=3 tokens, tokens_P=3 tokens, support=0, sketch_type=bytearray)
bits = res.to_binary()   # [0,1,0,0, …]

2. One‐line wrapper

from synrfp import synrfp

# Generate a 1024-bit binary fingerprint in one call
bits = synrfp(
    "CCO>>C=C.O",
    tokenizer="wl",
    radius=1,
    sketch="parity",
    bits=1024,
    seed=42,
)
print(len(bits), bits[:16])  # e.g. 1024 [0, 1, 0, 0, …]

3. Batch encoding

from synrfp.encoder import SynRFPEncoder

rxn_smiles = [
    "CO.O[C@@H]1CCNC1.[C-]#[N+]CC(=O)OC>>[C-]#[N+]CC(=O)N1CC[C@@H](O)C1",
    "CCOC(=O)C(CC)c1cccnc1.Cl.O>>CCC(C(=O)O)c1cccnc1",
]

# Encode two reactions into a 2×1024 array of bits
fps = SynRFPEncoder.encode(
    rxn_smiles,
    tokenizer="wl",
    radius=1,
    sketch="parity",
    bits=1024,
    seed=42,
)

print(fps.shape)    # (2, 1024)
print(fps[0][:16])  # first 16 bits of the first fingerprint

Contributing

License

This project is licensed under MIT License - see the License file for details.

Acknowledgments

This project has received funding from the European Unions Horizon Europe Doctoral Network programme under the Marie-Skłodowska-Curie grant agreement No 101072930 (TACsy -- Training Alliance for Computational)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synrfp-0.0.1.tar.gz (1.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

synrfp-0.0.1-py3-none-any.whl (26.4 kB view details)

Uploaded Python 3

File details

Details for the file synrfp-0.0.1.tar.gz.

File metadata

  • Download URL: synrfp-0.0.1.tar.gz
  • Upload date:
  • Size: 1.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for synrfp-0.0.1.tar.gz
Algorithm Hash digest
SHA256 a037195f6d1c1427456cc2093fbb6609352319450d18bbc5ad2ce54321f0b74d
MD5 f5586278033b252fc1e9c7ab8a717165
BLAKE2b-256 d688dbb79a1b0e1112effac35eb1f8d31d47b656edcf22d35677e3c0b294cfd0

See more details on using hashes here.

Provenance

The following attestation bundles were made for synrfp-0.0.1.tar.gz:

Publisher: publish-package.yml on TieuLongPhan/SynRFP

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file synrfp-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: synrfp-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 26.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for synrfp-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c2413293952c40e9241c3d555088798b64e5993e2a163cc808ba220372996fa4
MD5 f7f1da930c05ed020b7c62096bd968e9
BLAKE2b-256 c040d1ed796d4b9a3554fedfe20d8bb8e8e61113824c935b64e758f24e0485d6

See more details on using hashes here.

Provenance

The following attestation bundles were made for synrfp-0.0.1-py3-none-any.whl:

Publisher: publish-package.yml on TieuLongPhan/SynRFP

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page