Skip to main content

Reaction classification and atom-mapping

Project description

AgaveChem

PyPI version Maintenance License Run Tests Build Docs Open In Colab

An open-source Python library for atom-to-atom mapping (AAM) of chemical reactions. AgaveChem provides four composable mappers—from deterministic graph-based methods to a supervised neural mapper—that can be used individually or combined into a pipeline.

The primary contribution is a supervised ALBERT-based neural mapper trained without any per-reaction manual annotation. Ground truth atom maps are generated automatically from a curated expert template library and an MCS mapper applied to a filtered subset of the Lowe USPTO dataset, yielding a labeled training corpus orders of magnitude larger than what direct annotation can provide.

Neural mapper

  • Supervised ALBERT-based mapper: Trained in two phases—unsupervised masked language model (MLM) pre-training followed by supervised fine-tuning with a direct attention alignment objective against generated ground truth maps
  • Template and MCS-derived supervision: Ground truth is generated automatically from ~0.97M filtered USPTO reactions; the deterministic pipeline fully maps ~60% of reactions and covers ~90% of product atoms
  • State-of-the-art accuracy: Outperforms existing neural mappers on held-out benchmark reactions

Identical fragment mapper

  • Spectator molecule handling: Fragments appearing structurally unchanged on both sides of the reaction (counter-ions, solvents, spectator reagents) are detected and atom-mapped before any other mapper is invoked
  • Collision-free numbering: Pre-assigned atoms use a reserved numbering range to avoid conflicts with downstream mappers

MCS mapper

  • Environment fingerprint matching: Identifies invariant atoms using a bond-radius fingerprinting scheme rather than solving the full NP-hard MCS problem, enabling efficient partial mapping
  • Configurable radius: A min_radius_to_anchor_new_mapping parameter controls how close to the reactive center mapping extends, yielding conservative partial maps that avoid incorrectly assigning atoms near bond-breaking events
  • Tautomer and charge normalization: Molecules are normalized prior to fingerprinting to prevent spurious mismatches from charge variants or tautomers
  • Anchor-extend strategy: Alternates between propagating mappings from already-assigned anchor atoms and seeding new anchors, ensuring consistent multi-fragment mapping

Expert template mapper

  • Curated SMIRKS library: Reaction SMIRKS templates sourced from ReactionFlash, Rxn-INSIGHT, and manual curation are applied to classify and map reactions
  • Hierarchical templates: Templates are organized with parent-child priority relationships for fine-grained reaction class coverage
  • Custom template support: User-supplied SMIRKS patterns can supplement or replace the built-in library via custom_smirks_patterns
  • MCS-guided focus: Uses the MCS mapper internally to identify probable reaction centers, improving template matching efficiency

Mapping scorer

  • Multi-metric evaluation: Scores atom-mapped reactions across bond energy cost, bond change count, fragment changes, stereochemistry changes, and ring opening/closing events
  • Configurable weights: Each scoring component carries an adjustable weight for custom ranking strategies

Requirements

  • Python (version >= 3.10)
  • RDKit
  • rdchiral-plus
  • PyTorch
  • Transformers (Hugging Face)

Installation

Install AgaveChem with pip directly from this repo:

pip install git+https://github.com/denovochem/agave_chem.git

Or clone and install locally:

git clone https://github.com/denovochem/agave_chem.git
cd agave_chem
pip install .

Basic usage

Neural mapper (recommended for general use)

from agave_chem import NeuralReactionMapper

mapper = NeuralReactionMapper("my_mapper")
result = mapper.map_reaction("CC(Cl)(Cl)OC(C)(Cl)Cl.CC(=O)C(=O)O>>CC(=O)C(=O)Cl")
print(result["selected_mapping"])

MCS mapper (fast, deterministic, partial mapping)

from agave_chem import MCSReactionMapper

mapper = MCSReactionMapper("my_mcs_mapper")
result = mapper.map_reaction("CC(Cl)(Cl)OC(C)(Cl)Cl.CC(=O)C(=O)O>>CC(=O)C(=O)Cl")
print(result["selected_mapping"])

Expert template mapper (interpretable, mechanistically grounded)

from agave_chem import TemplateReactionMapper

mapper = TemplateReactionMapper("my_template_mapper")
result = mapper.map_reaction("CC(Cl)(Cl)OC(C)(Cl)Cl.CC(=O)C(=O)O>>CC(=O)C(=O)Cl")
print(result["selected_mapping"])

Mapping a batch of reactions through the full pipeline

from agave_chem import map_reactions

reactions = [
    "CC(Cl)(Cl)OC(C)(Cl)Cl.CC(=O)C(=O)O>>CC(=O)C(=O)Cl",
    "OCC(=O)OCCCO.Cl>>ClCC(=O)OCCCO",
]
results = map_reactions(reactions)
for r in results:
    print(r["final_mapping"])

Scoring a mapping

from agave_chem import MappingScorer

scorer = MappingScorer()
score = scorer.score_mapping("[CH3:1][C:2](=[O:3])[Cl:4]>>[CH3:1][C:2](=[O:3])[OH:5]")
print(score)

Documentation

Full documentation is available at the AgaveChem documentation site.

Contributing

  • Feature ideas and bug reports are welcome on the Issue Tracker.
  • Fork the source code on GitHub, make changes and file a pull request.

License

AgaveChem is licensed under the MIT license.

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agave_chem-0.1.0.tar.gz (12.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agave_chem-0.1.0-py3-none-any.whl (12.4 MB view details)

Uploaded Python 3

File details

Details for the file agave_chem-0.1.0.tar.gz.

File metadata

  • Download URL: agave_chem-0.1.0.tar.gz
  • Upload date:
  • Size: 12.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agave_chem-0.1.0.tar.gz
Algorithm Hash digest
SHA256 efc96d81be859d78da111a43ee54589a63e079ab3952b85a2d03df161c339c85
MD5 11f7c3dea32f3993592aafb5849dfbb7
BLAKE2b-256 14559f1ca4b955803a6a8ff27b00838cdc1778f71574e4f6ea842d9b478b48d2

See more details on using hashes here.

Provenance

The following attestation bundles were made for agave_chem-0.1.0.tar.gz:

Publisher: publish.yml on denovochem/agave_chem

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file agave_chem-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: agave_chem-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agave_chem-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e5daf2e755ebd977039d8c5c8c2a19684140f91b9ea7c2bcab1cee2797b448bb
MD5 7f11628371ba9da7e3148fb0907da509
BLAKE2b-256 50c69130a77e868e1a8b17407c1413c8daaf33ec0146feb887eb3072cd8cb0cf

See more details on using hashes here.

Provenance

The following attestation bundles were made for agave_chem-0.1.0-py3-none-any.whl:

Publisher: publish.yml on denovochem/agave_chem

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page