Skip to main content

Allows to retrieve information about an AnnData object via MCP

Project description

anndata-mcp

BioContextAI - Registry Tests Documentation

Allows retrieval and lazy access to information from AnnData objects via MCP (Model Context Protocol)

Features

This MCP server provides lazy, memory-efficient access to AnnData files for biomedical analysis agents. Key features include:

  • Lazy Loading: Only loads requested data into memory using anndata.experimental.read_lazy
  • Comprehensive Exploration: Get summaries, metadata, unique values, and statistics
  • Flexible Data Access: Retrieve specific slices of data matrices, embeddings, and metadata
  • Dataset2D Support: Handles both pandas DataFrames and AnnData's Dataset2D objects
  • Sparse Matrix Support: Efficiently handles sparse expression matrices

Tools Available

Summary Tools

  • get_anndata_summary - Get high-level overview of AnnData structure

Exploration Tools

  • get_attribute_info - Get detailed info about specific attributes
  • get_dataframe_info - Get detailed info about dataframe-like attributes (obs, var)
  • get_unique_values - Get unique values from a column
  • get_column_stats - Get statistics for a column
  • get_value_counts - Count occurrences of each unique value (e.g., cells per cluster)
  • get_grouped_stats - Calculate statistics grouped by a categorical column
  • list_available_keys - List keys in mapping attributes (obsm, varm, layers, etc.)

Data Access Tools

  • get_obs_data - Get cell/observation metadata slices
  • get_var_data - Get gene/variable metadata slices
  • get_X_data - Get expression matrix slices
  • get_layer_data - Get data from specific layers
  • get_obsm_data - Get multi-dimensional annotations (embeddings like PCA, UMAP)
  • get_varm_data - Get variable annotations
  • get_uns_data - Get unstructured data

Installation

You need to have Python 3.13 or newer installed on your system. If you don't have Python installed, we recommend installing uv.

Option 1: Use with uvx (Recommended)

uvx anndata_mcp

Option 2: Install via pip

pip install anndata_mcp

Option 3: Install for development

git clone https://github.com/dschaub95/anndata-mcp.git
cd anndata-mcp

# Create virtual environment with Python 3.13
uv venv --python 3.13

# Install dependencies
uv sync

Option 4: Install with examples

To run the examples, install with the examples extra:

pip install "anndata_mcp[examples]"
# or with uv:
uv sync --extra examples

Configuration

Include in your MCP client configuration (e.g., Claude Desktop, Continue, etc.):

{
  "mcpServers": {
    "anndata-mcp": {
      "command": "uvx",
      "args": ["anndata_mcp"],
      "env": {
        "UV_PYTHON": "3.13"
      }
    }
  }
}

Or if installed via pip:

{
  "mcpServers": {
    "anndata-mcp": {
      "command": "python",
      "args": ["-m", "anndata_mcp"],
      "env": {}
    }
  }
}

Usage

For LLM Agents

Once configured, LLM agents can use the tools to explore AnnData files:

Agent: "What's in the pbmc3k.h5ad file?"
→ Calls: get_anndata_summary("pbmc3k.h5ad")

Agent: "Show me unique cell types"
→ Calls: get_unique_values("pbmc3k.h5ad", "obs", "cell_type")

Agent: "Get UMAP coordinates for the first 100 cells"
→ Calls: get_obsm_data("pbmc3k.h5ad", "X_umap", row_slice="0:100")

Direct Python Usage

For testing or direct use in Python:

from anndata_mcp.tools import get_anndata_summary

# Call the tool directly
summary = get_anndata_summary("path/to/data.h5ad")
print(f"Dataset has {summary.n_obs} cells and {summary.n_vars} genes")

Examples

See the examples/ directory for detailed usage examples:

  • example_script.py: Comprehensive Python script showing all tools including:
    • Dataset exploration and metadata inspection
    • Cluster analysis with value counts and grouped statistics
    • Expression matrix access and embedding retrieval
    • Complete analysis workflows

To run the examples:

# Install with examples extra (includes scanpy for data download)
uv sync --extra examples

# Run the example script (downloads pbmc3k sample data automatically)
uv run examples/example_script.py

The example script demonstrates:

  1. Dataset summary and structure exploration
  2. Cell and gene metadata inspection
  3. Cluster information and statistics
  4. NEW: Value counts (cells per cluster)
  5. NEW: Grouped statistics (average genes per cluster)
  6. Expression matrix sampling
  7. UMAP/PCA embedding retrieval
  8. Complete analysis workflows

Use Cases

This MCP server is designed for biomedical analysis agents that need to:

  1. Explore large single-cell datasets without loading everything into memory
  2. Query specific metadata (cell types, gene names, QC metrics)
  3. Extract embeddings for visualization (UMAP, PCA, t-SNE)
  4. Access expression data for specific genes or cells
  5. Work with multiple data layers (raw counts, normalized, scaled)
  6. Handle datasets larger than RAM through lazy loading

Technical Details

Lazy Reading

The server uses anndata.experimental.read_lazy to open files without loading data into memory. Data is only loaded when specifically requested through tools.

Dataset2D Support

AnnData's lazy reading often returns Dataset2D objects instead of pandas DataFrames. This server handles both transparently.

Sparse Matrix Handling

Expression matrices are often sparse. The server can return sparse data in sparse format or automatically densify small slices for easier consumption.

Slicing Syntax

Use string-based slicing: "0:10", "100:200", ":50" for rows and columns.

Getting started

Please refer to the documentation, in particular, the API documentation.

You can also find the project on BioContextAI, the community-hub for biomedical MCP servers: anndata-mcp on BioContextAI.

Contact

If you found a bug, please use the issue tracker.

Citation

t.b.a

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anndata_mcp-0.0.1.tar.gz (220.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

anndata_mcp-0.0.1-py3-none-any.whl (21.2 kB view details)

Uploaded Python 3

File details

Details for the file anndata_mcp-0.0.1.tar.gz.

File metadata

  • Download URL: anndata_mcp-0.0.1.tar.gz
  • Upload date:
  • Size: 220.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for anndata_mcp-0.0.1.tar.gz
Algorithm Hash digest
SHA256 a532adc72673e7f94b5e6863588fca9c5a71832e131bb1ffda86bd3a4b2c839f
MD5 5182a8bff431dff861149ee243b09329
BLAKE2b-256 e34af825628883739f24acfb6bf2cc3c1d48cc5cd2799e7043736b7261a7710d

See more details on using hashes here.

Provenance

The following attestation bundles were made for anndata_mcp-0.0.1.tar.gz:

Publisher: release.yaml on biocontext-ai/anndata-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file anndata_mcp-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: anndata_mcp-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 21.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for anndata_mcp-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 888c4f601f009193ac66df0cd4775076c27b6415f84682fb0bf5bbf79eec4f57
MD5 3b868b1acb0ec8b6a9e927979a0aa9fa
BLAKE2b-256 8b7019d6c8338deb57a44c36c4175b5cd74c3ff6b5a0c2b490c0ee57081f146a

See more details on using hashes here.

Provenance

The following attestation bundles were made for anndata_mcp-0.0.1-py3-none-any.whl:

Publisher: release.yaml on biocontext-ai/anndata-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page