Reaction classification and atom-mapping
Project description
AgaveChem
An open-source Python library for atom-to-atom mapping (AAM) of chemical reactions. AgaveChem provides four composable mappers—from deterministic graph-based methods to a supervised neural mapper—that can be used individually or combined into a pipeline.
The primary contribution is a supervised ALBERT-based neural mapper trained without any per-reaction manual annotation. Ground truth atom maps are generated automatically from a curated expert template library and an MCS mapper applied to a filtered subset of the Lowe USPTO dataset, yielding a labeled training corpus orders of magnitude larger than what direct annotation can provide.
Neural mapper
- Supervised ALBERT-based mapper: Trained in two phases—unsupervised masked language model (MLM) pre-training followed by supervised fine-tuning with a direct attention alignment objective against generated ground truth maps
- Template and MCS-derived supervision: Ground truth is generated automatically from ~0.97M filtered USPTO reactions; the deterministic pipeline fully maps ~60% of reactions and covers ~90% of product atoms
- State-of-the-art accuracy: Outperforms existing neural mappers on held-out benchmark reactions
Identical fragment mapper
- Spectator molecule handling: Fragments appearing structurally unchanged on both sides of the reaction (counter-ions, solvents, spectator reagents) are detected and atom-mapped before any other mapper is invoked
- Collision-free numbering: Pre-assigned atoms use a reserved numbering range to avoid conflicts with downstream mappers
MCS mapper
- Environment fingerprint matching: Identifies invariant atoms using a bond-radius fingerprinting scheme rather than solving the full NP-hard MCS problem, enabling efficient partial mapping
- Configurable radius: A
min_radius_to_anchor_new_mappingparameter controls how close to the reactive center mapping extends, yielding conservative partial maps that avoid incorrectly assigning atoms near bond-breaking events - Tautomer and charge normalization: Molecules are normalized prior to fingerprinting to prevent spurious mismatches from charge variants or tautomers
- Anchor-extend strategy: Alternates between propagating mappings from already-assigned anchor atoms and seeding new anchors, ensuring consistent multi-fragment mapping
Expert template mapper
- Curated SMIRKS library: Reaction SMIRKS templates sourced from ReactionFlash, Rxn-INSIGHT, and manual curation are applied to classify and map reactions
- Hierarchical templates: Templates are organized with parent-child priority relationships for fine-grained reaction class coverage
- Custom template support: User-supplied SMIRKS patterns can supplement or replace the built-in library via
custom_smirks_patterns - MCS-guided focus: Uses the MCS mapper internally to identify probable reaction centers, improving template matching efficiency
Mapping scorer
- Multi-metric evaluation: Scores atom-mapped reactions across bond energy cost, bond change count, fragment changes, stereochemistry changes, and ring opening/closing events
- Configurable weights: Each scoring component carries an adjustable weight for custom ranking strategies
Requirements
- Python (version >= 3.10)
- RDKit
- rdchiral-plus
- PyTorch
- Transformers (Hugging Face)
Installation
Install AgaveChem with pip directly from this repo:
pip install git+https://github.com/denovochem/agave_chem.git
Or clone and install locally:
git clone https://github.com/denovochem/agave_chem.git
cd agave_chem
pip install .
Basic usage
Neural mapper (recommended for general use)
from agave_chem import NeuralReactionMapper
mapper = NeuralReactionMapper("my_mapper")
result = mapper.map_reaction("CC(Cl)(Cl)OC(C)(Cl)Cl.CC(=O)C(=O)O>>CC(=O)C(=O)Cl")
print(result["selected_mapping"])
MCS mapper (fast, deterministic, partial mapping)
from agave_chem import MCSReactionMapper
mapper = MCSReactionMapper("my_mcs_mapper")
result = mapper.map_reaction("CC(Cl)(Cl)OC(C)(Cl)Cl.CC(=O)C(=O)O>>CC(=O)C(=O)Cl")
print(result["selected_mapping"])
Expert template mapper (interpretable, mechanistically grounded)
from agave_chem import TemplateReactionMapper
mapper = TemplateReactionMapper("my_template_mapper")
result = mapper.map_reaction("CC(Cl)(Cl)OC(C)(Cl)Cl.CC(=O)C(=O)O>>CC(=O)C(=O)Cl")
print(result["selected_mapping"])
Mapping a batch of reactions through the full pipeline
from agave_chem import map_reactions
reactions = [
"CC(Cl)(Cl)OC(C)(Cl)Cl.CC(=O)C(=O)O>>CC(=O)C(=O)Cl",
"OCC(=O)OCCCO.Cl>>ClCC(=O)OCCCO",
]
results = map_reactions(reactions)
for r in results:
print(r["final_mapping"])
Scoring a mapping
from agave_chem import MappingScorer
scorer = MappingScorer()
score = scorer.score_mapping("[CH3:1][C:2](=[O:3])[Cl:4]>>[CH3:1][C:2](=[O:3])[OH:5]")
print(score)
Documentation
Full documentation is available at the AgaveChem documentation site.
Contributing
- Feature ideas and bug reports are welcome on the Issue Tracker.
- Fork the source code on GitHub, make changes and file a pull request.
License
AgaveChem is licensed under the MIT license.
References
- RXNMapper: Schwaller et al., Science Advances, 2021
- LocalMapper: Chen et al., Nat. Commun., 2024
- GraphormerMapper: Nugmanov et al., ChemRxiv, 2022
- Rxn-INSIGHT: Probst et al.
- rdchiral: Coley et al., J. Chem. Inf. Model., 2019
- rdchiral_plus
- Lowe USPTO dataset
- Benchmarking study: Lin et al., ChemRxiv, 2020
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agave_chem-0.1.0.tar.gz.
File metadata
- Download URL: agave_chem-0.1.0.tar.gz
- Upload date:
- Size: 12.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
efc96d81be859d78da111a43ee54589a63e079ab3952b85a2d03df161c339c85
|
|
| MD5 |
11f7c3dea32f3993592aafb5849dfbb7
|
|
| BLAKE2b-256 |
14559f1ca4b955803a6a8ff27b00838cdc1778f71574e4f6ea842d9b478b48d2
|
Provenance
The following attestation bundles were made for agave_chem-0.1.0.tar.gz:
Publisher:
publish.yml on denovochem/agave_chem
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agave_chem-0.1.0.tar.gz -
Subject digest:
efc96d81be859d78da111a43ee54589a63e079ab3952b85a2d03df161c339c85 - Sigstore transparency entry: 1526018848
- Sigstore integration time:
-
Permalink:
denovochem/agave_chem@ea16c4a47df2a7dbab64c52aae51fbd5f95e9644 -
Branch / Tag:
- Owner: https://github.com/denovochem
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ea16c4a47df2a7dbab64c52aae51fbd5f95e9644 -
Trigger Event:
release
-
Statement type:
File details
Details for the file agave_chem-0.1.0-py3-none-any.whl.
File metadata
- Download URL: agave_chem-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e5daf2e755ebd977039d8c5c8c2a19684140f91b9ea7c2bcab1cee2797b448bb
|
|
| MD5 |
7f11628371ba9da7e3148fb0907da509
|
|
| BLAKE2b-256 |
50c69130a77e868e1a8b17407c1413c8daaf33ec0146feb887eb3072cd8cb0cf
|
Provenance
The following attestation bundles were made for agave_chem-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on denovochem/agave_chem
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agave_chem-0.1.0-py3-none-any.whl -
Subject digest:
e5daf2e755ebd977039d8c5c8c2a19684140f91b9ea7c2bcab1cee2797b448bb - Sigstore transparency entry: 1526018945
- Sigstore integration time:
-
Permalink:
denovochem/agave_chem@ea16c4a47df2a7dbab64c52aae51fbd5f95e9644 -
Branch / Tag:
- Owner: https://github.com/denovochem
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ea16c4a47df2a7dbab64c52aae51fbd5f95e9644 -
Trigger Event:
release
-
Statement type: