Skip to main content

Reaction fingerprint

Project description

SynRFP

SynRFP (Synthesis Reaction FingerPrint) is a mapping-free, graph-invariant fingerprinting framework for chemical reactions. It represents transformations by:

  1. Extracting local graph tokens

    • Weisfeiler–Lehman (WL) subtree hashes
    • Canonical ego-subgraph hashes
  2. Computing a signed multiset difference

    • Δ = tokens(product) − tokens(reactant)
  3. Compressing into compact sketches

    • ParityFold: binary parity-fold into B bits
    • MinHashSketch: classical MinHash with m permutations
    • CWSketch: weighted MinHash for signed deltas

This approach requires no atom-mapping or reactant/reagent distinction, is permutation-invariant, and scales linearly with graph size.

SynRFP Workflow


📁 Repository Structure

synrfp/
├── __init__.py           # package exports & version
├── synrfp.py             # core driver: convenience builders & similarity functions, rsmi_to_fingerprint
├── graph/
│   ├── __init__.py
│   ├── graph_data.py     # GraphData container & utilities   └── reaction.py       # Reaction.from_rsmi / from_graph, Reaction collection API
├── tokenizers/
│   ├── __init__.py
│   ├── base.py           # BaseTokenizer interface   ├── utils.py          # _h64, atom_label_tuple, bond_label_tuple helpers   ├── wl.py             # WLTokenizer implementation   └── nauty.py          # NautyTokenizer implementation
└── sketchers/
    ├── __init__.py
    ├── base.py           # BaseSketch & WeightedSketch interfaces
    ├── parity_fold.py    # ParityFold sketcher
    ├── minhash_sketch.py # MinHashSketch sketcher
    └── cw_sketch.py      # CWSketch sketcher

⚙️ Installation

# 1) Clone the repository
git clone https://github.com/TieuLongPhan/synrfp.git
cd synrfp

# 2) Install the package (with optional extras)
pip install .                  # core functionality
pip install .[all]             # with datasketch and pynauty support

or can install via pip

pip install synrfp

🔧 Quick Start

1. Single‐reaction fingerprint

from synrfp.graph.reaction import Reaction
from synrfp import SynRFP
from synrfp.tokenizers.wl import WLTokenizer
from synrfp.sketchers.parity_fold import ParityFold

# Parse RSMI into GraphData
reactant_G, product_G = Reaction.from_rsmi("CCO>>C=C.O")

# Build engine: WL at radius 1 + 1024-bit parity-fold
fp_engine = SynRFP(
    tokenizer=WLTokenizer(),
    radius=1,
    sketch=ParityFold(bits=1024, seed=42),
)

# Compute fingerprint
res = fp_engine.fingerprint(reactant_G, product_G)
print(res)               # SynRFPResult(tokens_R=3 tokens, tokens_P=3 tokens, support=0, sketch_type=bytearray)
bits = res.to_binary()   # [0,1,0,0, …]

2. One‐line wrapper

from synrfp import synrfp

# Generate a 1024-bit binary fingerprint in one call
bits = synrfp(
    "CCO>>C=C.O",
    tokenizer="wl",
    radius=1,
    sketch="parity",
    bits=1024,
    seed=42,
)
print(len(bits), bits[:16])  # e.g. 1024 [0, 1, 0, 0, …]

3. Batch encoding

from synrfp import BatchEncoder

rxn_smiles = [
    "CO.O[C@@H]1CCNC1.[C-]#[N+]CC(=O)OC>>[C-]#[N+]CC(=O)N1CC[C@@H](O)C1",
    "CCOC(=O)C(CC)c1cccnc1.Cl.O>>CCC(C(=O)O)c1cccnc1",
]

# Encode two reactions into a 2×1024 array of bits
fps = BatchEncoder.encode(
    rxn_smiles,
    tokenizer="wl",
    radius=1,
    sketch="parity",
    bits=1024,
    seed=42,
    batch_size=2
)

print(fps.shape)    # (2, 1024)
print(fps[0][:16])  # first 16 bits of the first fingerprint

Contributing

License

This project is licensed under MIT License - see the License file for details.

Acknowledgments

This project has received funding from the European Unions Horizon Europe Doctoral Network programme under the Marie-Skłodowska-Curie grant agreement No 101072930 (TACsy -- Training Alliance for Computational)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synrfp-0.0.2.tar.gz (1.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

synrfp-0.0.2-py3-none-any.whl (54.5 kB view details)

Uploaded Python 3

File details

Details for the file synrfp-0.0.2.tar.gz.

File metadata

  • Download URL: synrfp-0.0.2.tar.gz
  • Upload date:
  • Size: 1.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for synrfp-0.0.2.tar.gz
Algorithm Hash digest
SHA256 0474adea43e1b8b413018e4a0f8bb1a0a2e2f615f8912e34ca446c9a3d427cf4
MD5 65792b21967f15b05313442a8ddebbde
BLAKE2b-256 6c4f13bf3a6465f90863632cda7dcb1cab06a2ac5ea93c03ae3db0f3a910558c

See more details on using hashes here.

Provenance

The following attestation bundles were made for synrfp-0.0.2.tar.gz:

Publisher: publish-package.yml on TieuLongPhan/SynRFP

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file synrfp-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: synrfp-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 54.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for synrfp-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 af011d763b15dafda4d1fc2bad55510faa9782bdc4d82ddb016129bae278d7ad
MD5 2d01dce24b89b51b6fc5cd3e96b3f161
BLAKE2b-256 c28c61d5265447e8f4c8b709fd51204f51a4f5de410fad4fe78831a33bb212ab

See more details on using hashes here.

Provenance

The following attestation bundles were made for synrfp-0.0.2-py3-none-any.whl:

Publisher: publish-package.yml on TieuLongPhan/SynRFP

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page