A Python module for cell type annotation using various LLMs.

These details have not been verified by PyPI

Project links

Project description

mLLMCelltype

Overview

mLLMCelltype is a Python package for cell type annotation in single-cell RNA sequencing data using an iterative multi-LLM consensus approach. It combines predictions from multiple large language models and provides uncertainty quantification. The package is compatible with the scverse ecosystem, including AnnData objects and Scanpy workflows.

Installation

pip install mllmcelltype

For development:

git clone https://github.com/cafferychen777/mLLMCelltype.git
cd mLLMCelltype/python
pip install -e .

Requirements: Python >= 3.9, internet connection for LLM API access.

Quick Start

import pandas as pd
from mllmcelltype import annotate_clusters

marker_genes_df = pd.read_csv('marker_genes.csv')

import os
os.environ["OPENAI_API_KEY"] = "your-openai-api-key"

annotations = annotate_clusters(
    marker_genes=marker_genes_df,
    species='human',
    provider='openai',
    model='gpt-5.5',
    tissue='brain'
)

for cluster, annotation in annotations.items():
    print(f"Cluster {cluster}: {annotation}")

Supported Providers and Models

Provider	Models	API Key Variable
OpenAI	GPT-5.5, GPT-5.4, GPT-5.4-mini	`OPENAI_API_KEY`
Anthropic	Claude Opus 4.7, Claude Sonnet 4.6, Claude Haiku 4.5	`ANTHROPIC_API_KEY`
Google	Gemini 3.1 Pro Preview, Gemini 3 Flash Preview, Gemini 3.1 Flash-Lite	`GEMINI_API_KEY` (also supports `GOOGLE_API_KEY`)
Alibaba	Qwen3.6 Plus, Qwen3.6 Flash, Qwen3.6 Max Preview	`QWEN_API_KEY`
DeepSeek	DeepSeek V4 Flash, DeepSeek V4 Pro	`DEEPSEEK_API_KEY`
StepFun	Step 3.5 Flash, Step 3	`STEPFUN_API_KEY`
Zhipu/Z.AI	GLM-5.1, GLM-5, GLM-5-Turbo	`ZHIPU_API_KEY`
MiniMax	MiniMax-M2.7, MiniMax-M2.7-highspeed	`MINIMAX_API_KEY`
X.AI	Grok-4.3	`GROK_API_KEY`
OpenRouter	Access to multiple models via single API	`OPENROUTER_API_KEY`

API keys can be set via environment variables, passed directly as parameters, or loaded from a .env file.

Annotation Features

Iterative Consensus: Multiple rounds of comparison between LLM outputs to resolve disagreements
Uncertainty Quantification: Consensus Proportion (CP) and Shannon Entropy (H) metrics
Cross-model Comparison: Helps identify inconsistent predictions across models
Hierarchical Annotation: Optional multi-resolution analysis with parent-child consistency
Caching: Avoids redundant API calls to reduce costs
Custom Base URLs: Configure custom API endpoints for proxy servers or enterprise gateways

Multi-LLM Consensus Annotation

from mllmcelltype import format_discussion_report, interactive_consensus_annotation

result = interactive_consensus_annotation(
    marker_genes=marker_genes,
    species='human',
    tissue='peripheral blood',
    models=['gpt-5.5', 'claude-sonnet-4-6', 'gemini-3.1-pro-preview', 'qwen3.6-plus'],
    consensus_threshold=0.7,
    max_discussion_rounds=3,
    verbose=True
)

print(result["consensus"])
print(format_discussion_report(result))

Consensus Model Selection

The consensus_model parameter specifies which LLM evaluates semantic similarity, calculates consensus metrics, and moderates discussions. Recommended models for consensus checking:

Anthropic: claude-sonnet-4-6, claude-opus-4-7
OpenAI: o1, gpt-5.5, gpt-4.1
Google: gemini-3.1-pro-preview, gemini-3-flash-preview
Other: deepseek-v4-pro, qwen3.6-plus, grok-4.3

result = interactive_consensus_annotation(
    marker_genes=marker_genes,
    species="human",
    tissue="brain",
    models=["gpt-5.5", "claude-sonnet-4-6", "gemini-3.1-pro-preview"],
    consensus_model="claude-sonnet-4-6",
    consensus_threshold=0.7,
    entropy_threshold=1.0
)

If not specified, the consensus checker is selected from the providers available in api_keys. Pass consensus_model explicitly for reproducible consensus checks.

Targeted Analysis

Analyze Specific Clusters

result = interactive_consensus_annotation(
    marker_genes=all_marker_genes,
    species="human",
    models=["gpt-5.5", "claude-sonnet-4-6", "gemini-3.1-pro-preview"],
    clusters_to_analyze=["cluster_0", "cluster_1", "cluster_2"],
    tissue="peripheral blood"
)

Force Fresh Analysis (Bypass Cache)

result = interactive_consensus_annotation(
    marker_genes=marker_genes,
    species="human",
    models=["gpt-5.5", "claude-sonnet-4-6"],
    tissue="peripheral blood",
    additional_context="Patient with autoimmune disease",
    force_rerun=True
)

OpenRouter Integration

OpenRouter provides a unified API for accessing models from multiple providers.

from mllmcelltype import annotate_clusters

annotations = annotate_clusters(
    marker_genes=marker_genes,
    species='human',
    tissue='peripheral blood',
    provider='openrouter',
    model='openai/gpt-5.5'
)

Free models are available with the :free suffix (e.g., meta-llama/llama-4-maverick:free).

Custom Base URL Configuration

base_urls = {
    'openai': 'https://openai-proxy.com/v1/chat/completions',
    'anthropic': 'https://anthropic-proxy.com/v1/messages',
    'qwen': 'https://qwen-proxy.com/compatible-mode/v1/chat/completions'
}

result = interactive_consensus_annotation(
    marker_genes=marker_genes,
    species='human',
    models=['gpt-5.5', 'claude-sonnet-4-6', 'qwen3.6-plus'],
    api_keys=your_api_keys,
    base_urls=base_urls
)

For Qwen models, endpoint selection between international and domestic endpoints is handled automatically.

Scanpy/AnnData Integration

import scanpy as sc
import mllmcelltype as mct

# After standard preprocessing and clustering...
sc.tl.rank_genes_groups(adata, 'leiden', method='wilcoxon')
marker_genes = {}
for cluster in adata.obs['leiden'].unique():
    genes = sc.get.rank_genes_groups_df(adata, group=cluster)['names'].tolist()[:20]
    marker_genes[cluster] = genes

# Single model annotation
annotations = mct.annotate_clusters(
    marker_genes=marker_genes,
    species='human',
    provider='openai',
    model='gpt-5.5'
)
adata.obs['cell_type'] = adata.obs['leiden'].astype(str).map(annotations)

# Or multi-model consensus
consensus_results = mct.interactive_consensus_annotation(
    marker_genes=marker_genes,
    species='human',
    models=['gpt-5.5', 'claude-sonnet-4-6', 'gemini-3.1-pro-preview'],
    consensus_threshold=0.7
)
adata.obs['consensus_cell_type'] = adata.obs['leiden'].astype(str).map(consensus_results["consensus"])
adata.obs['consensus_proportion'] = adata.obs['leiden'].astype(str).map(consensus_results["consensus_proportion"])
adata.obs['entropy'] = adata.obs['leiden'].astype(str).map(consensus_results["entropy"])

See the examples directory for complete workflow examples.

Contributing

We welcome contributions. Please submit issues or pull requests on our GitHub repository.

Running tests

Default tests are offline and do not call external APIs:

python -m pytest

Real API integration tests are opt-in:

python -m pytest --run-integration

License

MIT License

Citation

If you use mLLMCelltype in your research, please cite:

@article{Yang2025.04.10.647852,
  author = {Yang, Chen and Zhang, Xianyang and Chen, Jun},
  title = {Large Language Model Consensus Substantially Improves the Cell Type Annotation Accuracy for scRNA-seq Data},
  elocation-id = {2025.04.10.647852},
  year = {2025},
  doi = {10.1101/2025.04.10.647852},
  publisher = {Cold Spring Harbor Laboratory},
  URL = {https://www.biorxiv.org/content/early/2025/04/17/2025.04.10.647852},
  journal = {bioRxiv}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

2.0.5

May 11, 2026

2.0.4

Mar 25, 2026

2.0.3

Feb 9, 2026

2.0.2

Feb 5, 2026

2.0.1

Jan 26, 2026

2.0.0

Jan 26, 2026

1.3.4

Aug 27, 2025

1.3.3

Aug 27, 2025

1.3.2

Aug 8, 2025

1.2.9

Jun 28, 2025

1.2.3

Jun 20, 2025

1.2.2

May 14, 2025

1.2.0

Apr 23, 2025

1.1.0

Apr 21, 2025

1.0.2

Apr 13, 2025

1.0.1

Apr 13, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mllmcelltype-2.0.5.tar.gz (87.9 kB view details)

Uploaded May 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mllmcelltype-2.0.5-py3-none-any.whl (63.7 kB view details)

Uploaded May 11, 2026 Python 3

File details

Details for the file mllmcelltype-2.0.5.tar.gz.

File metadata

Download URL: mllmcelltype-2.0.5.tar.gz
Upload date: May 11, 2026
Size: 87.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for mllmcelltype-2.0.5.tar.gz
Algorithm	Hash digest
SHA256	`d9c85b08095f5f0ebba25d6921a21f203db0a87e59b8254b08477ffd313d6232`
MD5	`6a85b14c80b6c0626212f17f24bae51e`
BLAKE2b-256	`e042d35b00df8f7254cc406b3e454b7343ca40ae0bbf54f92adba39208b16b37`

See more details on using hashes here.

File details

Details for the file mllmcelltype-2.0.5-py3-none-any.whl.

File metadata

Download URL: mllmcelltype-2.0.5-py3-none-any.whl
Upload date: May 11, 2026
Size: 63.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for mllmcelltype-2.0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7cc434bdc97ba9fc5d50f4a8700cf15cb8e60abcb3a979d0b2ce483f7851ba5c`
MD5	`34f65aa86631e8ddb04cdd5fb215ff2a`
BLAKE2b-256	`909ed67a8ccfaeb4fc6badeb445f5915752d9d1e21591c8a4b589c554ddd4095`

See more details on using hashes here.

mllmcelltype 2.0.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

mLLMCelltype

Overview

Installation

Quick Start

Supported Providers and Models

Annotation Features

Multi-LLM Consensus Annotation

Consensus Model Selection

Targeted Analysis

Analyze Specific Clusters

Force Fresh Analysis (Bypass Cache)

OpenRouter Integration

Custom Base URL Configuration

Scanpy/AnnData Integration

Contributing

Running tests

License

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes