Skip to main content

A Python package for building and optimizing cheminformatics workflows using Bayesian optimization and LLM agents

Project description

cmxflow 🧪

Docs CI codecov Python 3.10+ Code style: black License: MIT

Composable cheminformatics workflows.

Overview 🔬

cmxflow is a Python framework for building and optimizing cheminformatics pipelines. Chain together molecular operations as blocks, then let Bayesian optimization find the best parameters for your task.

Read the full documentation →

Two Usage Modes ⚗️

cmxflow is designed to work both as:

  1. An Agentic Tool - via MCP (Model Context Protocol) server, allowing LLM agents to build and optimize workflows conversationally
  2. A Programmatic API - for direct Python usage in scripts and notebooks

Block Types 🧬

Workflows are built from four types of blocks:

Block Type Purpose
SourceBlock Read molecules from files (SDF, SMILES, CSV, Parquet)
Block Transform molecules (1:1 or N:M)
SinkBlock Write molecules to files
ScoreBlock Compute optimization objective

Example Operators 💊

Block Purpose
MoleculeStandardizeBlock Standardize molecules (metals, salts, charges, tautomers)
MoleculeDeduplicateBlock Remove duplicate molecules by canonical SMILES
RDKitBlock Apply any RDKit method (descriptors, transformations)
SubstructureFilterBlock Filter by SMARTS patterns or catalogs (PAINS, BRENK, etc.)
PropertyFilterBlock Filter molecules by property conditions
PropertyHeadBlock Select top N molecules by property
PropertyTailBlock Select bottom N molecules by property
MoleculeSimilarityBlock Compute 2D fingerprint similarity
Molecule3DSimilarityBlock Compute 3D shape similarity
IonizeMoleculeBlock Generate pH-dependent ionization states
EnumerateStereoBlock Enumerate all stereoisomers
ConformerGenerationBlock Generate 3D conformers (ETKDGv3)
MoleculeAlignBlock Align molecules to 3D reference
MoleculeDockBlock Dock into protein binding pocket
RepresentativeClusterBlock Cluster molecules by fingerprint similarity (leader algorithm)

Example Score Blocks 📊

ScoreBlock Purpose
EnrichmentScoreBlock Enrichment AUC for virtual screening
AverageScoreBlock Mean of a molecular property
ShapeOverlayScoreBlock Average 3D shape similarity
ClusterScoreBlock Cluster quality from representative clustering

Features 🚀

  • Composable Pipelines - Chain blocks with workflow.add()
  • Bayesian Optimization - Find optimal parameters via Optuna
  • Parallel Execution - make_parallel() for compute-intensive blocks
  • Mutable Parameters - Categorical, Integer, and Continuous types
  • Serialization - save_workflow() and load_workflow() for persistence
  • MCP Server - Agentic workflow building via build_workflow, run_workflow, optimize_workflow

Environment Variables 🔧

Variable Default Description
CMXFLOW_WORKER_TIMEOUT 30 Seconds to wait for a single parallel worker before treating it as failed. Set to 0 to disable the timeout. Applies to all make_parallel() and @parallel blocks.

Getting Started 📖

See examples/basic_usage.ipynb for a complete tutorial covering:

  • Building your first workflow
  • 2D similarity search
  • Mutable parameters and optimization
  • Parallel execution
  • Analyzing results with Optuna

The tutorial uses the ABL1 kinase benchmark from the wonderful DUD-E database.

Installation 🛠️

pip install cmxflow

MCP Server

To use cmxflow as an agentic tool with Claude Code:

claude mcp add cmxflow -- cmxflow-mcp

Optional Dependencies

PyMOL — Required only for 3D structure visualization (view_structures MCP tool). Install via conda:

conda install -c conda-forge pymol-open-source

All other functionality works without PyMOL.

Contributing 🤝

Contributions are welcome! This is a side project, so reviews may take some time, but PRs are appreciated.

Before Submitting

  1. Open an issue first for significant changes to discuss the approach
  2. Fork the repo and create a feature branch from main
  3. Follow the code style - run mypy, black, and ruff before committing (or install provided precommit hooks)

PR Requirements

  • Clear description of the bug fixed or feature added
  • Minimal reproducible example demonstrating the change
  • Tests covering new functionality (pytest)
  • Type hints for all new code
  • Docstrings following Google conventions

Development Setup

conda config --set solver libmamba
conda env create -f conda.yml
conda activate cmxflow
poetry install
pre-commit install  # Ensures formatting/linting on commit

Running Tests

pytest tests/

Releases

Releases are published to PyPI automatically when a pull request is merged into main with a version bump tag in the PR title:

Tag in PR title Version bump Example
[patch] Bug fixes, docs (0.1.0 → 0.1.1) Fix conformer bug [patch]
[minor] New features, backwards-compatible (0.1.0 → 0.2.0) Add ProtonationBlock [minor]
[major] Breaking changes (0.1.0 → 1.0.0) Redesign block API [major]

PRs without a tag merge normally without triggering a release.

License 📄

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cmxflow-0.3.0.tar.gz (74.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cmxflow-0.3.0-py3-none-any.whl (95.6 kB view details)

Uploaded Python 3

File details

Details for the file cmxflow-0.3.0.tar.gz.

File metadata

  • Download URL: cmxflow-0.3.0.tar.gz
  • Upload date:
  • Size: 74.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cmxflow-0.3.0.tar.gz
Algorithm Hash digest
SHA256 7dff8962fb145b8813d30aa8d2c01a9f534a5edf6325dd0c6d14527c3cfd7212
MD5 3daf54618e3e792e3db555da7bb7ba37
BLAKE2b-256 1f21bc42099c80b9973b5626517c2b591676acfdd84c6f6ec80eea308b8db911

See more details on using hashes here.

Provenance

The following attestation bundles were made for cmxflow-0.3.0.tar.gz:

Publisher: release.yml on b-shields/cmxflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cmxflow-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: cmxflow-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 95.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cmxflow-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f841401e0fed9a742b60855312f9b2292b86803f6f2dd05ba4dda7c772e64b6e
MD5 fa5598d195409a7cc38c1562e6d2156b
BLAKE2b-256 1798c10f7d03d21166e9dbbf321c0fbf3198dec1c40d25c68cb44976e9ec4bc6

See more details on using hashes here.

Provenance

The following attestation bundles were made for cmxflow-0.3.0-py3-none-any.whl:

Publisher: release.yml on b-shields/cmxflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page