Python client for characterization of clusters from single-cell RNA-seq data.
Project description
CyteType
CyteType is a Python package for deep chracterization of cell clusters from single-cell RNA-seq data. This package interfaces with Anndata objects to call CyteType API.
Quick Start
import anndata
import scanpy as sc
import cytetype
# Load and preprocess your data
adata = anndata.read_h5ad("path/to/your/data.h5ad")
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
sc.pp.pca(adata)
sc.pp.neighbors(adata)
sc.tl.leiden(adata, key_added = "clusters")
sc.tl.rank_genes_groups(adata, groupby='clusters', method='t-test')
# Initialize CyteType (performs data preparation)
annotator = cytetype.CyteType(adata, group_key='clusters')
# Run annotation
adata = annotator.run(
study_context="Human brain tissue from Alzheimer's disease patients"
)
# View results
print(adata.obs.cytetype_annotation_clusters)
print(adata.obs.cytetype_cellOntologyTerm_clusters)
Installation
pip install cytetype
Usage
Required Preprocessing
Your AnnData object must have:
- Log-normalized expression data in
adata.X - Cluster labels in
adata.obs - Differential expression results from
sc.tl.rank_genes_groups
import scanpy as sc
# Standard preprocessing
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
# Clustering
sc.pp.pca(adata)
sc.pp.neighbors(adata)
sc.tl.leiden(adata, key_added='clusters')
# Differential expression (required)
sc.tl.rank_genes_groups(adata, groupby='clusters', method='t-test')
Annotation
from cytetype import CyteType
# Initialize (data preparation happens here)
annotator = CyteType(adata, group_key='clusters')
# Run annotation
adata = annotator.run(
study_context="Adult human brain tissue samples from healthy controls and Alzheimer's disease patients, analyzed using 10X Genomics single-cell RNA-seq. Samples include cortical and hippocampal regions."
)
# Or with custom metadata for tracking
adata = annotator.run(
study_context="Adult human brain tissue samples from healthy controls and Alzheimer's disease patients, analyzed using 10X Genomics single-cell RNA-seq. Samples include cortical and hippocampal regions.",
metadata={
'experiment_name': 'Brain_AD_Study',
'run_label': 'initial_analysis'
}
)
# Results are stored in:
# - adata.obs.cytetype_annotation_clusters (cell type annotations)
# - adata.obs.cytetype_cellOntologyTerm_clusters (cell ontology terms)
# - adata.uns['cytetype_results'] (full API response)
The study_context should include comprehensive biological information about your experimental setup:
- Organisms: Species being studied (e.g., "human", "mouse")
- Tissues: Tissue types and anatomical regions
- Diseases: Disease conditions or states
- Developmental stages: Age, developmental timepoints
- Single-cell methods: Sequencing platform (e.g., "10X Genomics", "Smart-seq2")
- Experimental conditions: Treatments, time courses, perturbations
Example: "Adult human brain tissue samples from healthy controls and Alzheimer's disease patients, analyzed using 10X Genomics single-cell RNA-seq. Samples include cortical and hippocampal regions."
Configuration Options
Initialization Parameters
annotator = CyteType(
adata,
group_key='leiden', # Required: cluster column name
rank_key='rank_genes_groups', # DE results key (default)
gene_symbols_column='gene_symbols', # Gene symbols column (default)
n_top_genes=50, # Top marker genes per cluster
aggregate_metadata=True, # Aggregate metadata (default)
min_percentage=10, # Min percentage for cluster context
pcent_batch_size=2000, # Batch size for calculations
)
Submitting Annotation job
The run method accepts several configuration parameters to control the annotation process:
Custom LLM Configuration
The CyteType API provides access to some chosen LLM providers by default. Users can choose to provide their own LLM models and model providers. Many models can be provided simultaneously and then they will be used iteratively for each of the clusters.
adata = annotator.run(
study_context="Human PBMC from COVID-19 patients",
model_config=[{
'provider': 'openai',
'name': 'gpt-4o-mini',
'apiKey': 'your-api-key',
'baseUrl': 'https://api.openai.com/v1', # Optional
'modelSettings': { # Optional
'temperature': 0.0,
'max_tokens': 4096
}
}],
)
Rate Limiting
If you do not provide your own model providers then the CyteType API implements rate limiting for fair usage:
- Annotation submissions: 5 requests per hour per IP
- Result retrieval: 20 requests per minute per IP
If you exceed rate limits, the system will return appropriate error messages with retry timing information
Supported providers: openai, anthropic, google, xai, groq, mistral, openrouter
Advanced parameters
adata = annotator.run(
...
run_config={
'concurrentClusters': 3, # Default: 3, Range: 2-10
'maxAnnotationRevisions': 2, # Default: 2, Range: 1-5
'maxLLMRequests': 500 # Default: 500, Range: 50-2000
},
# Custom metadata for tracking
metadata={
'experiment_name': 'PBMC_COVID_Study',
'run_label': 'baseline_analysis',
'researcher': 'Dr. Smith',
'batch': 'batch_001'
},
# API polling and timeout settings
poll_interval_seconds=10, # How often to check for results
timeout_seconds=1200, # Max wait time (20 minutes)
# API configuration
api_url="https://custom-api.com", # Custom API endpoint
auth_token="your-auth-token", # Authentication token
save_query=True # Save query to query.json
)
Run configuration
concurrentClusters(int, default=5, range=2-30): Maximum number of clusters to process simultaneously. Higher values may speed up processing but can cause rate limit errors from LLM API providers.maxAnnotationRevisions(int, default=2, range=1-5): Maximum number of refinement iterations based on reviewer feedback. More revisions may improve annotation quality but increase processing time.maxLLMRequests(int, default=500, range=50-2000): Maximum total number of LLM API calls allowed for the entire job. The job will be terminated if this limit is reached. Helps control costs and prevents runaway processes.
Additional Run Parameters
metadata(dict, optional): Custom metadata to send with the API request for tracking purposes. Can include experiment names, run labels, researcher information, or any other user-defined data. This metadata is sent to the API but not stored locally with results.poll_interval_seconds(int, default=30): How frequently to check the API for job completion.timeout_seconds(int, default=3600): Maximum time to wait for results before timing out.api_url(str): Custom API endpoint URL for self-hosted deployments.auth_token(str, optional): Bearer token for API authentication.save_query(bool, default=True): Whether to save the API query toquery.jsonfor debugging.
Annotation Process
CyteType performs comprehensive cell type annotation through an automated pipeline:
Core Functionality
- Automated Annotation: Identifies likely cell types for each cluster based on marker genes
- Ontology Mapping: Maps identified cell types to Cell Ontology terms (e.g.,
CL_0000127) - Review & Justification: Analyzes supporting/conflicting markers and assesses confidence
- Alternative Suggestions: Provides potential alternative annotations when applicable
- Real-time Progress: Updates results incrementally as clusters are processed
Job Status and Progress
When you submit an annotation job, it progresses through several stages:
- PENDING: Job is queued and waiting to start
- PROCESSING: Job is actively running with incremental results available
- COMPLETED: All clusters have been annotated successfully
- FAILED: Processing encountered an error
During PROCESSING, you can:
- Monitor progress through the report URL
- View partial results for completed clusters
- See biological context summaries
- Track completion status
Result Format
Results include detailed annotations for each cluster:
# Access results after annotation using the helper method
results = annotator.get_results()
# Or access directly from the stored JSON string
import json
results = json.loads(adata.uns['cytetype_results']['result'])
# Each annotation includes:
for annotation in results['annotations']:
print(f"Cluster: {annotation['clusterId']}")
print(f"Cell Type: {annotation['annotation']}")
print(f"Confidence: {annotation['confidence']}")
print(f"Ontology Term: {annotation['ontologyTerm']}")
print(f"Supporting Markers: {annotation['supportingMarkers']}")
print(f"Justification: {annotation['justification']}")
Example Report
View a sample annotation report: CyteType Report
Development
Setup
git clone https://github.com/NygenAnalytics/cytetype.git
cd cytetype
uv sync --all-extras
uv run pip install -e .
Testing
uv run pytest # Run tests
uv run ruff check . # Linting
uv run ruff format . # Formatting
uv run mypy . # Type checking
License
Licensed under CC BY-NC-SA 4.0 - see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cytetype-0.5.0.tar.gz.
File metadata
- Download URL: cytetype-0.5.0.tar.gz
- Upload date:
- Size: 24.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fe6dddc7757badaad5c95efd29d5b45312029210494c50e800ae222a093fd84e
|
|
| MD5 |
f8e7792bbd88c93d609bb8b308ad267b
|
|
| BLAKE2b-256 |
5cc1b4d931690b6c0ce17016d070438897327a4b517c40a9db1729b10ddf0d4c
|
Provenance
The following attestation bundles were made for cytetype-0.5.0.tar.gz:
Publisher:
publish.yml on NygenAnalytics/CyteType
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cytetype-0.5.0.tar.gz -
Subject digest:
fe6dddc7757badaad5c95efd29d5b45312029210494c50e800ae222a093fd84e - Sigstore transparency entry: 234204079
- Sigstore integration time:
-
Permalink:
NygenAnalytics/CyteType@379091462eceb8fdc62c64d597f9ef15ff188b82 -
Branch / Tag:
refs/tags/0.5.0 - Owner: https://github.com/NygenAnalytics
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@379091462eceb8fdc62c64d597f9ef15ff188b82 -
Trigger Event:
release
-
Statement type:
File details
Details for the file cytetype-0.5.0-py3-none-any.whl.
File metadata
- Download URL: cytetype-0.5.0-py3-none-any.whl
- Upload date:
- Size: 24.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5965d22057c1f0321d566928a76357c087ed61ec4133b5b8e39f3ec6fa9967c3
|
|
| MD5 |
fd764d6233b1ad2dbecbe0942762f6e1
|
|
| BLAKE2b-256 |
7a6df8e288ee35be92db1680e5cf3e29f1ccdb46be7124051b3a01e2606d0a73
|
Provenance
The following attestation bundles were made for cytetype-0.5.0-py3-none-any.whl:
Publisher:
publish.yml on NygenAnalytics/CyteType
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cytetype-0.5.0-py3-none-any.whl -
Subject digest:
5965d22057c1f0321d566928a76357c087ed61ec4133b5b8e39f3ec6fa9967c3 - Sigstore transparency entry: 234204084
- Sigstore integration time:
-
Permalink:
NygenAnalytics/CyteType@379091462eceb8fdc62c64d597f9ef15ff188b82 -
Branch / Tag:
refs/tags/0.5.0 - Owner: https://github.com/NygenAnalytics
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@379091462eceb8fdc62c64d597f9ef15ff188b82 -
Trigger Event:
release
-
Statement type: