Skip to main content

Hybrid-strict hierarchical reaction classification (LLM-derived taxonomy + deterministic template matching).

Project description

ReactionClassifier

Hierarchical reaction classification. Given a reaction SMILES, it predicts a class in an LLM-derived reaction taxonomy and confirms it symbolically: a Morgan difference–product (MDP) fingerprint MLP gate proposes a class, the exact retrosynthetic templates in that class's tier-3 subtree are applied to the reaction, and a label is returned only if a template reproduces the recorded product

Install

pip install reactionclassifier          # requires rdkit, torch, numpy

Quickstart

from reactionclassifier import ReactionClassifier

clf = ReactionClassifier()              # loads the bundled gate + templates + taxonomy
r = clf.classify("CC(=O)O.NCc1ccccc1>>CC(=O)NCc1ccccc1")
r.reaction_code     # '2.1.2.1'   (deterministically confirmed)
r.reaction_name     # 'Amidation using Carboxylic Acids | Primary Amine + Carboxylic Acid to Secondary Amide'
r.tier_path         # ['2.1', '2.1.2', '2.1.2.1']
r.confidence        # Top-1 probability of the neural layer

classify() returns a ClassificationResult:

field meaning
reaction_code the deterministically confirmed class code — a template fired and reproduced the product. None if unconfirmed.
reaction_name pipe-separated level 3/4/5 names of reaction_code (tiers 1-2 omitted).
neural_code / neural_name the neural-gate prediction (always available); use as a fallback when reaction_code is None. Same name format.
confidence neural-gate softmax confidence
tier_path ancestor codes of reaction_code

So: if reaction_code is populated you have a verified label; otherwise neural_code/neural_name give the model's best (unverified) guess.

Taxonomy and granularity examples

from reactionclassifier import load_taxonomy, name_for, full_class_name, tier_path, load_granularity
load_taxonomy()["1.3.6"]      # -> single class name
name_for("2.1.2.1")           # -> single-level name
full_class_name("2.1.2.1")    # -> pipe-joined L3|L4|L5 names, e.g.
                              #    'Amidation using Carboxylic Acids | Primary Amine + Carboxylic Acid to Secondary Amide'
tier_path("1.3.6.2")          # -> ['1.3', '1.3.6', '1.3.6.2']
load_granularity()            # the two granularity-comparison tables

What's included

Component File
MDP-gate MLP (full-data, 6,962 classes) data/gate/
Exact rr0rp1_ring0 template library data/class_to_templates.json
Full taxonomy (14,060 code→name) data/taxonomy.json
Granularity examples (+ a small illustrative SMIRKS subset) data/granularity_examples*.json

Full reaction database

The full labelled reaction database (≈666k reactions) is hosted externally (Zenodo) rather than shipped in the wheel:

from reactionclassifier.database import download_database
path = download_database()     # downloads + caches the parquet, returns its path

The released database excludes NameRXN-derived columns (NAME, CLASS); NameRXN is proprietary and its labels are not redistributed.

Detail

  • The MDP fingerprint is a Morgan difference (reactant⊕product bit-unions) concatenated with the product fingerprint (RDKit, radius 2, 2048 bits each; 4096-dim).
  • The full generalised-SMIRKS library is not released; only the small subset embedded in the granularity examples is included.

License

MIT (code and bundled data). See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reactionclassifier-0.1.0.tar.gz (22.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

reactionclassifier-0.1.0-py3-none-any.whl (22.1 MB view details)

Uploaded Python 3

File details

Details for the file reactionclassifier-0.1.0.tar.gz.

File metadata

  • Download URL: reactionclassifier-0.1.0.tar.gz
  • Upload date:
  • Size: 22.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for reactionclassifier-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d0c0067cd4018fbb855c398fd8b0f04e8dce22ef63b3665f729ebaf23e4bf241
MD5 36f146716a471166f193aa5df10b16b3
BLAKE2b-256 134d32f17e3a14ce57eceb9bc55ea81d9343a284c8a6ed3f30249593514f2e97

See more details on using hashes here.

Provenance

The following attestation bundles were made for reactionclassifier-0.1.0.tar.gz:

Publisher: publish.yml on schwallergroup/ReactionClassifier

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file reactionclassifier-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for reactionclassifier-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3f780b86919b40e45d2532167d3f4ec8e53f0923e815f00328d96b504f50ad4f
MD5 c64bb971fd155ea345fa47c3f507b1cb
BLAKE2b-256 68a82859d07b9f4c03e2e7e27879209b6b2786c1ae95b234cce85808d81a6fb0

See more details on using hashes here.

Provenance

The following attestation bundles were made for reactionclassifier-0.1.0-py3-none-any.whl:

Publisher: publish.yml on schwallergroup/ReactionClassifier

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page