Skip to main content

Community MCP server for the CZ CELLxGENE Discover Census single-cell atlas. Ontology-aware, provenance-tracked, unaffiliated with CZI.

Project description

cxg-census-mcp

PyPI PyPI downloads CI License: MIT Python Ruff Checked with mypy pre-commit MCP Status: alpha Last commit

An MCP server that lets LLM agents query the CZ CELLxGENE Discover Census single-cell atlas without lying about it — ontology-aware filters, cost caps, full provenance + attribution on every response. Drop it into Cursor / Claude Desktop / Claude Code and ask questions like "compare immune cell composition of healthy vs COVID-19 human lung" in plain English.

Independent / unaffiliated. Not affiliated with, endorsed by, or sponsored by the Chan Zuckerberg Initiative (CZI), EMBL-EBI, the U.S. Census Bureau, or anyone else. "CELLxGENE" is a CZI mark; references here are descriptive (nominative) use only.

No warranty. MIT-licensed source, "as is". Research/exploration tool — not a clinical or diagnostic instrument. Always verify results before publication. See LICENSE for the full trademark and content attribution notice, and SECURITY.md for the threat model and known-issues policy.

Alpha (v0.1.1). CHANGELOG.md

Demos

Healthy vs COVID-19 lung, side-by-side. Two parallel queries, the disease_multi_value_v7 schema-drift rewrite kicks in for the COVID cohort, attribution from both contributing dataset sets surfaces in the same chat turn.

https://github.com/user-attachments/assets/c836f225-5075-4643-87aa-70d311bc5fd2

Cell-type composition of human lung in one query. Free-text "lung" resolved to UBERON:0002048, routed through tissue_general, every CURIE labeled, all in a single Tier-0 call.

https://github.com/user-attachments/assets/b0e10ca7-e46b-4e5f-ae63-11949d328c4d

(Videos render on GitHub. On PyPI they appear as bare URLs — head to the GitHub README to watch.)

More prompts in docs/example-questions.md.

Architecture at a glance

                 ┌──────────────────────────────────────────────┐
   MCP client    │   tools/        thin MCP wrappers, no logic  │
   (Claude,  ─►  │     │                                        │
    Cursor,      │     ▼                                        │
    Code, …)     │   planner/      FilterSpec → QueryPlan,      │
                 │     │           cost estimate, tier routing  │
                 │     ▼                                        │
                 │   ontology/     OLS4 + hint overlay,         │
                 │     │           CL/UBERON/MONDO expansion    │
                 │     ▼                                        │
                 │   execution/    Tier 0  facet counts         │
                 │     │           Tier 1  chunked obs scan     │
                 │     │           Tier 2  expression aggregate │
                 │     │           Tier 9  refuse → snippet     │
                 │     ▼                                        │
                 │   clients/      OLS4 (HTTPS) + Census/SOMA   │
                 │                                              │
                 │   caches/       OLS, facet, plan, filter LRU │
                 │   models/       Response envelope w/         │
                 │                 attribution + provenance     │
                 └──────────────────────────────────────────────┘
                                    │
                                    ▼
                       ┌────────────────────────┐
                       │ EBI OLS4 (ontology)    │
                       │ CZ CELLxGENE Census    │
                       │ (CC BY 4.0 data)       │
                       └────────────────────────┘

Full architecture notes: docs/architecture.md. Tool reference: docs/tool-reference.md. Example questions: docs/example-questions.md.

Install

From PyPI (recommended):

uv tool install "cxg-census-mcp[census]"
cxg-census-mcp                       # speaks MCP over stdio

Or with pip:

pip install "cxg-census-mcp[census]"

Without the [census] extra you get mock mode (deterministic fixtures) — handy for offline demos and verifying your MCP client config without pulling tiledbsoma's ~1 GB of native deps.

From source (for development):

git clone https://github.com/MaxMLang/cxg-census-mcp
cd cxg-census-mcp
uv sync --extra dev --extra census
uv run cxg-census-mcp

MCP client config

Cursor (~/.cursor/mcp.json) and Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json on macOS) both expect the same shape. Cleanest is uvx once installed from PyPI:

{
  "mcpServers": {
    "cxg-census": {
      "command": "/absolute/path/to/uvx",
      "args": ["--from", "cxg-census-mcp[census]", "cxg-census-mcp"]
    }
  }
}

Use the absolute path to uvx (which uvx from your shell). MCP clients spawn the server in a non-interactive subprocess that doesn't source your shell rc, so a bare "uvx" will fail with No such file or directory.

If you cloned from source instead, point at the checkout:

{
  "mcpServers": {
    "cxg-census": {
      "command": "/absolute/path/to/uv",
      "args": ["--directory", "/path/to/cxg-census-mcp", "run", "cxg-census-mcp"]
    }
  }
}

Claude Code:

claude mcp add cxg-census -- /absolute/path/to/uvx --from "cxg-census-mcp[census]" cxg-census-mcp

Quit + relaunch your client (⌘Q on macOS — closing the window isn't enough) and the server should show up in the MCP panel with 13 tools.

Tools (13 total)

Workflow: census_summary, get_census_versions, count_cells, list_datasets, gene_coverage, aggregate_expression, preview_obs, export_snippet, get_server_limits.

Inspection: resolve_term, expand_term, term_definition, list_available_values.

Plus MCP resources (markdown docs at cxg-census-mcp://docs/{slug}), prompts (census_workflow, disambiguation), and cooperative progress / cancellation notifications. Details in docs/tool-reference.md.

Configuration

All env vars use the CXG_CENSUS_MCP_ prefix. Most useful:

Variable Default Purpose
CXG_CENSUS_MCP_CENSUS_VERSION stable Census release to pin
CXG_CENSUS_MCP_CACHE_DIR platformdirs default Disk cache root
CXG_CENSUS_MCP_MOCK_MODE 0 If 1, never opens a real Census handle
CXG_CENSUS_MCP_LOG_LEVEL WARNING stdlib log level

Full list and validation: src/cxg_census_mcp/config.py.

Development & operations

Quick loop:

make install-all                 # uv sync --extra dev --extra census
make lint typecheck test         # ruff + mypy + pytest (mock mode)
make cov                         # tests + coverage HTML in ./htmlcov
make audit                       # pip-audit on locked production deps

Operational tasks (cache pre-warm, schema diff, container build, metrics dump, plan-cache vacuum, weekly hint/facet refresh) live in the Makefile and are documented in docs/operational-playbook.md.

Documentation index

Topic Where
System architecture docs/architecture.md
Tool reference docs/tool-reference.md
Example agent questions docs/example-questions.md
Ontology resolution docs/ontology-resolution.md
Schema-drift handling docs/schema-drift-format.md
Census version pinning docs/version-pinning.md
Progress / cancellation docs/progress-and-cancellation.md
Error model docs/error-model.md
Known limitations docs/limitations.md
Ops runbook docs/operational-playbook.md
Changelog CHANGELOG.md

License & attribution

Source code: MIT. The MIT license covers only the code in this repository, not the upstream data, ontologies, or third-party trademarks.

  • Data. Tool responses are derived (filtered/aggregated) from the CZ CELLxGENE Discover Census, distributed by the Chan Zuckerberg Initiative under CC BY 4.0. Every response carries an attribution field; downstream users must preserve attribution and indicate that changes were made.
  • Ontologies are fetched via EBI Ontology Lookup Service (OLS4) from CL, UBERON, MONDO, EFO, HANCESTRO, and others; each carries its own license.
  • Trademarks ("CELLxGENE", "Cursor", "Claude", "Anthropic", "Model Context Protocol", …) belong to their respective owners. Use here is descriptive only and does not imply affiliation.

This project is a client of the CZ CELLxGENE Discover Census; it does not host, mirror, or redistribute Census data.

Full notice in LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cxg_census_mcp-0.1.1.tar.gz (75.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cxg_census_mcp-0.1.1-py3-none-any.whl (109.2 kB view details)

Uploaded Python 3

File details

Details for the file cxg_census_mcp-0.1.1.tar.gz.

File metadata

  • Download URL: cxg_census_mcp-0.1.1.tar.gz
  • Upload date:
  • Size: 75.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cxg_census_mcp-0.1.1.tar.gz
Algorithm Hash digest
SHA256 6c7a820afacfc86759d1ca8f1215e5d66381f2cff6269b1a3772b1934275deee
MD5 815567f9cb21d23b6ec0ed95c996b94e
BLAKE2b-256 3f1c8d936ac5958f64ba7e0cdff61d4c766618fc5984cf86e37f4a437d7bb35b

See more details on using hashes here.

Provenance

The following attestation bundles were made for cxg_census_mcp-0.1.1.tar.gz:

Publisher: release.yml on MaxMLang/cxg-census-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cxg_census_mcp-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: cxg_census_mcp-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 109.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cxg_census_mcp-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 378c4f668f77b06e6e65980cf4e198399036391cc84cf42dd32d02109f3ada36
MD5 4f4a98f2382ca41b125f071cf648e7d4
BLAKE2b-256 e00be5816f3ac7bf4915a09ceaabb52f7a45b3d32bf4dc7ca2ae87e4287420f8

See more details on using hashes here.

Provenance

The following attestation bundles were made for cxg_census_mcp-0.1.1-py3-none-any.whl:

Publisher: release.yml on MaxMLang/cxg-census-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page