Skip to main content

Frontier retrosynthesis prediction model

Project description

RetroChimera

Backed by SyntheseusPaper

CI Python Version pypi code style License

RetroChimera is a frontier retrosynthesis model, built upon ensembling two novel components with complementary inductive biases. It outperforms existing models by a large margin, can learn from a very small number of examples per reaction class, and is preferred by industrial organic chemists over the reactions it was trained on in blind tests.

Using RetroChimera

To install retrochimera locally, run

conda env create -f environment.yml
conda activate retrochimera

pip install retrochimera

then you can run inference via

from retrochimera import RetroChimeraModel
from syntheseus import Molecule

model = RetroChimeraModel(model_dir="/model/checkpoint/dir/")
mol = Molecule("Oc1ccc(OCc2ccccc2)c(Br)c1")

predictions = model([mol], num_results=3)

for p in predictions[0]:
    print(p, f"({100. * p.metadata['probability']:.2f}%)")

For installation, there are two additional dependency groups: dev for running tests, and graphium for building the model architecture we used for USPTO-50K; if you care about running the USPTO-50K checkpoint, you need to install via pip install retrochimera[graphium].

If you want to train your own checkpoint, please follow the instructions in retrochimera/README.md.

Checkpoints for RetroChimera 1

The main (and most powerful) checkpoint we release is trained on Pistachio. For benchmarking, we also provide (weaker) checkpoints trained on USPTO-50K and USPTO-FULL.

If you care about reproducing the USPTO-* results from our paper exactly, make sure to use the inference hyperparameters listed in Extended Data Tables 3 and 4. By default, these parameters are set to values optimal for the Pistachio checkpoint.

[!WARNING] RetroChimera 1 is being released for research and experimentation - we hope you try to break it in any way possible and share the results back to us. As any ML model it is not free from errors and may hallucinate, in particular when used for inputs out of the training distribution. We look forward to collaborating with the community so we can improve the model for everyone!

Before using any of the predictions in a real-world setting, they must be risk-assessed and verified independently by chemistry experts. In particular, reactions ranked lower in the output list are increasingly likely to be hallucinations; we recommend requesting no more than 5-10 reactions per input unless paired with stringent filtering (see e.g. [1][2])

If you find that RetroChimera 1 doesn't work on your favourite drug-like molecule, please let us know at retrochimera@microsoft.com, so we can make sure we improve this in the next model version.

Citation

If you use RetroChimera in your work, please consider citing our arXiv preprint (bibtex below).

@article{maziarz2025chemist,
  title={Chemist-aligned retrosynthesis by ensembling diverse inductive bias models},
  author={Maziarz, Krzysztof and Liu, Guoqing and Misztela, Hubert and Tripp, Austin and Li, Junren and Kornev, Aleksei and Gai{\'n}ski, Piotr and Hoefling, Holger and Fortunato, Mike and Gupta, Rishi and Segler, Marwin},
  journal={arXiv preprint arXiv:2412.05269},
  year={2025}
}

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

retrochimera-1.1.0.tar.gz (189.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

retrochimera-1.1.0-py3-none-any.whl (165.3 kB view details)

Uploaded Python 3

File details

Details for the file retrochimera-1.1.0.tar.gz.

File metadata

  • Download URL: retrochimera-1.1.0.tar.gz
  • Upload date:
  • Size: 189.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.7

File hashes

Hashes for retrochimera-1.1.0.tar.gz
Algorithm Hash digest
SHA256 8df69f83f41bc944b653a88a482f1cb17e72ba876a2f73d171580c69a23d0903
MD5 c0d2103b0e60e3f304bc04d4ae32dda0
BLAKE2b-256 fd395332e1b25a55b0e1cca84af493906edff9ab6cc913a3faa4d0980a18c273

See more details on using hashes here.

File details

Details for the file retrochimera-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: retrochimera-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 165.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.7

File hashes

Hashes for retrochimera-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 285a7e03630121f0066410d23e27a6c85885ba89e71f57abb2d317ab834fa6c7
MD5 0daa46cdc819fde3c0ab171e9c45da0e
BLAKE2b-256 c4ff291f5835f97d16a6a9172917a619f4bd56ad86c6fa612defc0c0cd0e5f15

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page