Skip to main content

CReM: chemically reasonable mutations framework

Project description

CReM

CReM — chemically reasonable mutations

PyPI version Documentation License: BSD-3-Clause

CReM is an open-source Python framework to generate chemical structures using a fragment-based approach.

The idea is similar to matched molecular pairs: fragments that occur in the same context are considered interchangeable. CReM stores such context–fragment relationships in a database and uses them to generate chemically valid structures.

Features

  • Four generation modesmutate, grow, link, and make_cycle (ring closure / macrocyclization).
  • Custom fragment databases built in one step with cremdb_create, or downloaded as precompiled ChEMBL databases.
  • Multiple fragment sets per database — switch between them at generation time with set_names and a frequency threshold (min_freq).
  • Fine control — context radius, fragment-size windows, replaceable/protected atoms, and replace_cycles for partial-ring replacement.
  • Custom selection — bias or restrict fragments with filter_func / sample_func, or with molecular-property columns.
  • Reproducible and parallelseed for deterministic sampling; ncores and picklable *_mol2 wrappers for multiprocessing.

Links

Installation

pip install crem

From source:

git clone https://github.com/DrrDom/crem
cd crem
pip install .

CReM requires rdkit>=2025.3.5. Optional extras: guacamol (to run the benchmark) and zstandard (to read .zst-compressed input when building databases).

Quick start

All examples assume a fragment database fragments.dbbuild one or download a precompiled ChEMBL database.

from rdkit import Chem
from crem.crem import mutate_mol, grow_mol, link_mols, make_cycle

m = Chem.MolFromSmiles('c1cc(OC)ccc1C')          # methoxytoluene

# replace an existing fragment
mutants = list(mutate_mol(m, db_name='fragments.db', max_size=1))

# decorate by replacing a hydrogen
grown = list(grow_mol(m, db_name='fragments.db'))

# link two molecules with a linker
m2 = Chem.MolFromSmiles('NCC(=O)O')              # glycine
linked = list(link_mols(m, m2, db_name='fragments.db'))

# form a new ring
cyclic = list(make_cycle(m, db_name='fragments.db', ring_size=(5, 7)))

All four are generators (wrap in list(...)) and share many options — radius, size windows, min_freq / set_names, replace_ids / protected_ids, filter_func / sample_func, max_replacements, seed, and ncores. See Mutate, grow, link, Advanced fragment selection, and the API reference.

Build a fragment database

Build a database directly from a SMILES file in one step:

cremdb_create -i input.smi -o fragments.db -s chembl

This produces the current database format with fragment-set support and ring-closure fragments. For multiple sets, property columns, sharded/parallel builds, conversion of older databases, and the programmatic crem.db API, see Fragment databases.

Benchmarks

GuacaMol goal-directed benchmark (scores marked * are from the original GuacaMol publication):

task SMILES LSTM* SMILES GA* Graph GA* Graph MCTS* CReM
Celecoxib rediscovery 1.000 0.732 1.000 0.355 1.000
Troglitazone rediscovery 1.000 0.515 1.000 0.311 1.000
Thiothixene rediscovery 1.000 0.598 1.000 0.311 1.000
Aripiprazole similarity 1.000 0.834 1.000 0.380 1.000
Albuterol similarity 1.000 0.907 1.000 0.749 1.000
Mestranol similarity 1.000 0.79 1.000 0.402 1.000
C11H24 0.993 0.829 0.971 0.410 0.966
C9H10N2O2PF2Cl 0.879 0.889 0.982 0.631 0.940
Median molecules 1 0.438 0.334 0.406 0.225 0.371
Median molecules 2 0.422 0.38 0.432 0.170 0.434
Osimertinib MPO 0.907 0.886 0.953 0.784 0.995
Fexofenadine MPO 0.959 0.931 0.998 0.695 1.000
Ranolazine MPO 0.855 0.881 0.92 0.616 0.969
Perindopril MPO 0.808 0.661 0.792 0.385 0.815
Amlodipine MPO 0.894 0.722 0.894 0.533 0.902
Sitagliptin MPO 0.545 0.689 0.891 0.458 0.763
Zaleplon MPO 0.669 0.413 0.754 0.488 0.770
Valsartan SMARTS 0.978 0.552 0.990 0.04 0.994
Deco Hop 0.996 0.970 1.000 0.590 1.000
Scaffold Hop 0.998 0.885 1.000 0.478 1.000
total score 17.341 14.398 17.983 9.011 17.919

Limitations

  • CReM builds structures only from fragments present in the database, so the ring systems that can appear depend on the database. make_cycle and replace_cycles form or replace rings using fragments observed in the database rather than inventing entirely new ring systems.
  • Very large molecules are skipped in some workflows: a molecule with more than 30 non-ring single bonds is not mutated, and one with more than 100 hydrogen atoms is not grown or linked.
  • Context canonicalization relies on RDKit's SMILES output. A database is best used with the RDKit version it was built with (no incompatibilities observed so far); pin RDKit when sharing databases across machines.

License

BSD-3-Clause. See LICENSE.txt.

Citation

CReM: chemically reasonable mutations framework for structure generation Pavel Polishchuk Journal of Cheminformatics 2020, 12, (1), 28 https://doi.org/10.1186/s13321-020-00431-w

Control of Synthetic Feasibility of Compounds Generated with CReM Pavel Polishchuk Journal of Chemical Information and Modeling 2020, 60, 6074-6080 https://dx.doi.org/10.1021/acs.jcim.0c00792

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crem-0.3.0.tar.gz (484.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crem-0.3.0-py3-none-any.whl (115.8 kB view details)

Uploaded Python 3

File details

Details for the file crem-0.3.0.tar.gz.

File metadata

  • Download URL: crem-0.3.0.tar.gz
  • Upload date:
  • Size: 484.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for crem-0.3.0.tar.gz
Algorithm Hash digest
SHA256 d138320728baa8026627d763f196c8b5c02fd339ae5e2402430a36e72a515f71
MD5 c8e8afe7ff76f49e2b5bc9a1e5238ae4
BLAKE2b-256 75430ca982f2a34baffdde1b38728cfda4c99d0d90122e9e0e8689954008b8c1

See more details on using hashes here.

File details

Details for the file crem-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: crem-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 115.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for crem-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 aac0759dfae3d02a8e6e202f3b42b4c311bd2eb74385ee20e0b2844eb6457d9a
MD5 ce372698e63e800e4955f172c146cc66
BLAKE2b-256 cb327eb934d90105d5aa0228f7169ac70cf078a82c5ca18f658c65da51f7e970

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page