A Python package for building and optimizing cheminformatics workflows using Bayesian optimization and LLM agents
Project description
cmxflow 🧪
Composable cheminformatics workflows with Bayesian optimization.
Overview 🔬
cmxflow is a Python framework for building and optimizing cheminformatics pipelines. Chain together molecular operations as blocks, then let Bayesian optimization find the best parameters for your task.
Two Usage Modes ⚗️
cmxflow is designed to work both as:
- An Agentic Tool - via MCP (Model Context Protocol) server, allowing LLM agents to build and optimize workflows conversationally
- A Programmatic API - for direct Python usage in scripts and notebooks
Block Types 🧬
Workflows are built from four types of blocks:
| Block Type | Purpose |
|---|---|
| SourceBlock | Read molecules from files (SDF, SMILES, CSV, Parquet) |
| Block | Transform molecules (1:1 or N:M) |
| SinkBlock | Write molecules to files |
| ScoreBlock | Compute optimization objective |
Example Operators 💊
| Block | Purpose |
|---|---|
MoleculeStandardizeBlock |
Standardize molecules (metals, salts, charges, tautomers) |
MoleculeDeduplicateBlock |
Remove duplicate molecules by canonical SMILES |
RDKitBlock |
Apply any RDKit method (descriptors, transformations) |
SubstructureFilterBlock |
Filter by SMARTS patterns or catalogs (PAINS, BRENK, etc.) |
PropertyFilterBlock |
Filter molecules by property conditions |
PropertyHeadBlock |
Select top N molecules by property |
PropertyTailBlock |
Select bottom N molecules by property |
MoleculeSimilarityBlock |
Compute 2D fingerprint similarity |
Molecule3DSimilarityBlock |
Compute 3D shape similarity |
IonizeMoleculeBlock |
Generate pH-dependent ionization states |
EnumerateStereoBlock |
Enumerate all stereoisomers |
ConformerGenerationBlock |
Generate 3D conformers (ETKDGv3) |
MoleculeAlignBlock |
Align molecules to 3D reference |
MoleculeDockBlock |
Dock into protein binding pocket |
RepresentativeClusterBlock |
Cluster molecules by fingerprint similarity (leader algorithm) |
Example Score Blocks 📊
| ScoreBlock | Purpose |
|---|---|
EnrichmentScoreBlock |
Enrichment AUC for virtual screening |
AverageScoreBlock |
Mean of a molecular property |
ShapeOverlayScoreBlock |
Average 3D shape similarity |
ClusterScoreBlock |
Cluster quality from representative clustering |
Features 🚀
- Composable Pipelines - Chain blocks with
workflow.add() - Bayesian Optimization - Find optimal parameters via Optuna
- Parallel Execution -
make_parallel()for compute-intensive blocks - Mutable Parameters - Categorical, Integer, and Continuous types
- Serialization -
save_workflow()andload_workflow()for persistence - MCP Server - Agentic workflow building via
build_workflow,run_workflow,optimize_workflow
Environment Variables 🔧
| Variable | Default | Description |
|---|---|---|
CMXFLOW_WORKER_TIMEOUT |
30 |
Seconds to wait for a single parallel worker before treating it as failed. Set to 0 to disable the timeout. Applies to all make_parallel() and @parallel blocks. |
Getting Started 📖
See examples/basic_usage.ipynb for a complete tutorial covering:
- Building your first workflow
- 2D similarity search
- Mutable parameters and optimization
- Parallel execution
- Analyzing results with Optuna
The tutorial uses the ABL1 kinase benchmark from the wonderful DUD-E database.
Installation 🛠️
pip install cmxflow
MCP Server
To use cmxflow as an agentic tool with Claude Code:
claude mcp add cmxflow -- cmxflow-mcp
Optional Dependencies
PyMOL — Required only for 3D structure visualization (view_structures MCP tool). Install via conda:
conda install -c conda-forge pymol-open-source
All other functionality works without PyMOL.
Contributing 🤝
Contributions are welcome! This is a side project, so reviews may take some time, but PRs are appreciated.
Before Submitting
- Open an issue first for significant changes to discuss the approach
- Fork the repo and create a feature branch from
main - Follow the code style - run
mypy,black, andruffbefore committing (or install provided precommit hooks)
PR Requirements
- Clear description of the bug fixed or feature added
- Minimal reproducible example demonstrating the change
- Tests covering new functionality (
pytest) - Type hints for all new code
- Docstrings following Google conventions
Development Setup
conda config --set solver libmamba
conda env create -f conda.yml
conda activate cmxflow
poetry install
pre-commit install # Ensures formatting/linting on commit
Running Tests
pytest tests/
Releases
Releases are published to PyPI automatically when a pull request is merged into main with a version bump tag in the PR title:
| Tag in PR title | Version bump | Example |
|---|---|---|
[patch] |
Bug fixes, docs (0.1.0 → 0.1.1) | Fix conformer bug [patch] |
[minor] |
New features, backwards-compatible (0.1.0 → 0.2.0) | Add ProtonationBlock [minor] |
[major] |
Breaking changes (0.1.0 → 1.0.0) | Redesign block API [major] |
PRs without a tag merge normally without triggering a release.
License 📄
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cmxflow-0.1.0.tar.gz.
File metadata
- Download URL: cmxflow-0.1.0.tar.gz
- Upload date:
- Size: 74.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0b2a28d11f25ca2abcf510c634fecf724dfee09be88976669221f4ff94bc86ac
|
|
| MD5 |
94cef3eaf21bed9b19e1c9581b1a5642
|
|
| BLAKE2b-256 |
d434325471c3280259b435bc01914fb475ec9259f2415d20c8c92d9f5580f760
|
Provenance
The following attestation bundles were made for cmxflow-0.1.0.tar.gz:
Publisher:
release.yml on b-shields/cmxflow
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cmxflow-0.1.0.tar.gz -
Subject digest:
0b2a28d11f25ca2abcf510c634fecf724dfee09be88976669221f4ff94bc86ac - Sigstore transparency entry: 1191777874
- Sigstore integration time:
-
Permalink:
b-shields/cmxflow@e070a78295694562862e0d8c32bfba18a558197a -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/b-shields
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e070a78295694562862e0d8c32bfba18a558197a -
Trigger Event:
release
-
Statement type:
File details
Details for the file cmxflow-0.1.0-py3-none-any.whl.
File metadata
- Download URL: cmxflow-0.1.0-py3-none-any.whl
- Upload date:
- Size: 95.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f3d71358c965d75a65bddd198561855764b542a8e7af4f8738483cc4b6774e8d
|
|
| MD5 |
c4bc9733424a82eb9f14d4f3b37ca5f7
|
|
| BLAKE2b-256 |
bb7271eaf69535a65197afbbd429593d2efb805269f559c1f8e31aabdfa334f0
|
Provenance
The following attestation bundles were made for cmxflow-0.1.0-py3-none-any.whl:
Publisher:
release.yml on b-shields/cmxflow
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cmxflow-0.1.0-py3-none-any.whl -
Subject digest:
f3d71358c965d75a65bddd198561855764b542a8e7af4f8738483cc4b6774e8d - Sigstore transparency entry: 1191777875
- Sigstore integration time:
-
Permalink:
b-shields/cmxflow@e070a78295694562862e0d8c32bfba18a558197a -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/b-shields
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e070a78295694562862e0d8c32bfba18a558197a -
Trigger Event:
release
-
Statement type: