Skip to main content

A Python package for building and optimizing cheminformatics workflows using Bayesian optimization and LLM agents

Project description

cmxflow 🧪

Docs CI codecov Python 3.11+ Code style: black License: MIT

Composable cheminformatics workflows.

Overview 🔬

cmxflow is a Python framework for building and optimizing cheminformatics pipelines. Chain together molecular operations as blocks, then let Bayesian optimization find the best parameters for your task.

Read the full documentation →

Two Usage Modes ⚗️

cmxflow is designed to work both as:

  1. An Agentic Tool - via MCP (Model Context Protocol) server, allowing LLM agents to build and optimize workflows conversationally
  2. A Programmatic API - for direct Python usage in scripts and notebooks

Block Types 🧬

Workflows are built from four types of blocks:

Block Type Purpose
SourceBlock Read molecules from files (SDF, SMILES, CSV, Parquet)
Block Transform molecules (1:1 or N:M)
SinkBlock Write molecules to files
ScoreBlock Compute optimization objective

Example Operators 💊

Block Purpose
MoleculeStandardizeBlock Standardize molecules (metals, salts, charges, tautomers)
MoleculeDeduplicateBlock Remove duplicate molecules by canonical SMILES
RDKitBlock Apply any RDKit method (descriptors, transformations)
SubstructureFilterBlock Filter by SMARTS patterns or catalogs (PAINS, BRENK, etc.)
PropertyFilterBlock Filter molecules by property conditions
PropertyHeadBlock Select top N molecules by property
PropertyTailBlock Select bottom N molecules by property
MoleculeSimilarityBlock Compute 2D fingerprint similarity
Molecule3DSimilarityBlock Compute 3D shape similarity
IonizeMoleculeBlock Generate pH-dependent ionization states
EnumerateStereoBlock Enumerate all stereoisomers
ConformerGenerationBlock Generate 3D conformers (ETKDGv3)
MoleculeAlignBlock Align molecules to 3D reference
MoleculeDockBlock Dock into protein binding pocket
RepresentativeClusterBlock Cluster molecules by fingerprint similarity (leader algorithm)

Example Score Blocks 📊

ScoreBlock Purpose
EnrichmentScoreBlock Enrichment AUC for virtual screening
AverageScoreBlock Mean of a molecular property
ShapeOverlayScoreBlock Average 3D shape similarity
ClusterScoreBlock Cluster quality from representative clustering

Features 🚀

  • Composable Pipelines - Chain blocks with workflow.add()
  • Bayesian Optimization - Find optimal parameters via Optuna
  • Parallel Execution - make_parallel() for compute-intensive blocks
  • Mutable Parameters - Categorical, Integer, and Continuous types
  • Serialization - save_workflow() and load_workflow() for persistence
  • MCP Server - Agentic workflow building via build_workflow, run_workflow, optimize_workflow

Environment Variables 🔧

Variable Default Description
CMXFLOW_WORKER_TIMEOUT 30 Seconds to wait for a single parallel worker before treating it as failed. Set to 0 to disable the timeout. Applies to all make_parallel() and @parallel blocks.

Getting Started 📖

See examples/basic_usage.ipynb for a complete tutorial covering:

  • Building your first workflow
  • 2D similarity search
  • Mutable parameters and optimization
  • Parallel execution
  • Analyzing results with Optuna

The tutorial uses the ABL1 kinase benchmark from the wonderful DUD-E database.

Installation 🛠️

pip install cmxflow

MCP Server

To use cmxflow as an agentic tool with Claude Code:

claude mcp add cmxflow -- cmxflow-mcp

Optional Dependencies

PyMOL — Required only for 3D structure visualization (view_structures MCP tool). Install via conda:

conda install -c conda-forge pymol-open-source

All other functionality works without PyMOL.

Contributing & Releases 🤝

See CONTRIBUTING.md for development setup and PR requirements, and RELEASING.md for the PyPI and MCP Registry release flow.

License 📄

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cmxflow-0.3.1.tar.gz (73.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cmxflow-0.3.1-py3-none-any.whl (95.0 kB view details)

Uploaded Python 3

File details

Details for the file cmxflow-0.3.1.tar.gz.

File metadata

  • Download URL: cmxflow-0.3.1.tar.gz
  • Upload date:
  • Size: 73.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cmxflow-0.3.1.tar.gz
Algorithm Hash digest
SHA256 758f7351d85d955fa6bf1b88113aa8782e09174e0fc98bc032e99b85085eb2b1
MD5 48044ba9e4b42ed9849f199032dc4d62
BLAKE2b-256 50db86e68f8476bd341b4c128310ca5c3e91c2113bbe0eef51332b433c0ee604

See more details on using hashes here.

Provenance

The following attestation bundles were made for cmxflow-0.3.1.tar.gz:

Publisher: release.yml on b-shields/cmxflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cmxflow-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: cmxflow-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 95.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cmxflow-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8b68f60de4b1a087145ea6cf58723881f0d985dd7d8b47e7636785d5d7b9d4ce
MD5 74a08c33cb3fd04eafb4a5375157a237
BLAKE2b-256 400a8f15334398ba791771193959cedaaaae9bba4bf2c9bc2b26d13d8508fbc7

See more details on using hashes here.

Provenance

The following attestation bundles were made for cmxflow-0.3.1-py3-none-any.whl:

Publisher: release.yml on b-shields/cmxflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page