Python client for characterization of clusters from single-cell RNA-seq data.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

parashardhapola

These details have not been verified by PyPI

Project description

CyteType

Python Version

CyteType is a Python package for deep chracterization of cell clusters from single-cell RNA-seq data. This package interfaces with Anndata objects to call CyteType API.

Quick Start

import anndata
import scanpy as sc
import cytetype

# Load and preprocess your data
adata = anndata.read_h5ad("path/to/your/data.h5ad")
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
sc.pp.pca(adata)
sc.pp.neighbors(adata)
sc.tl.leiden(adata, key_added = "clusters") 
sc.tl.rank_genes_groups(adata, groupby='clusters', method='t-test')

# Initialize CyteType (performs data preparation)
annotator = cytetype.CyteType(adata, group_key='clusters')

# Run annotation
adata = annotator.run(
    study_context="Human brain tissue from Alzheimer's disease patients"
)

# View results
print(adata.obs.cytetype_annotation_clusters)
print(adata.obs.cytetype_cellOntologyTerm_clusters)

Installation

pip install cytetype

Usage

Required Preprocessing

Your AnnData object must have:

Log-normalized expression data in adata.X
Cluster labels in adata.obs
Differential expression results from sc.tl.rank_genes_groups

import scanpy as sc

# Standard preprocessing
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)

# Clustering
sc.pp.pca(adata)
sc.pp.neighbors(adata)
sc.tl.leiden(adata, key_added='clusters')

# Differential expression (required)
sc.tl.rank_genes_groups(adata, groupby='clusters', method='t-test')

Annotation

from cytetype import CyteType

# Initialize (data preparation happens here)
annotator = CyteType(adata, group_key='clusters')

# Run annotation
adata = annotator.run(
    study_context="Adult human brain tissue samples from healthy controls and Alzheimer's disease patients, analyzed using 10X Genomics single-cell RNA-seq. Samples include cortical and hippocampal regions."
)

# Or with custom metadata for tracking
adata = annotator.run(
    study_context="Adult human brain tissue samples from healthy controls and Alzheimer's disease patients, analyzed using 10X Genomics single-cell RNA-seq. Samples include cortical and hippocampal regions.",
    metadata={
        'experiment_name': 'Brain_AD_Study',
        'run_label': 'initial_analysis'
    }
)

# Results are stored in:
# - adata.obs.cytetype_annotation_clusters (cell type annotations)
# - adata.obs.cytetype_cellOntologyTerm_clusters (cell ontology terms)
# - adata.uns['cytetype_results'] (full API response)

The study_context should include comprehensive biological information about your experimental setup:

Organisms: Species being studied (e.g., "human", "mouse")
Tissues: Tissue types and anatomical regions
Diseases: Disease conditions or states
Developmental stages: Age, developmental timepoints
Single-cell methods: Sequencing platform (e.g., "10X Genomics", "Smart-seq2")
Experimental conditions: Treatments, time courses, perturbations

Example: "Adult human brain tissue samples from healthy controls and Alzheimer's disease patients, analyzed using 10X Genomics single-cell RNA-seq. Samples include cortical and hippocampal regions."

Configuration Options

Initialization Parameters

annotator = CyteType(
    adata,
    group_key='leiden',                    # Required: cluster column name
    rank_key='rank_genes_groups',          # DE results key (default)
    gene_symbols_column='gene_symbols',    # Gene symbols column (default)
    n_top_genes=50,                        # Top marker genes per cluster
    aggregate_metadata=True,               # Aggregate metadata (default)
    min_percentage=10,                     # Min percentage for cluster context
    pcent_batch_size=2000,                 # Batch size for calculations
)

Submitting Annotation job

The run method accepts several configuration parameters to control the annotation process:

Custom LLM Configuration

The CyteType API provides access to some chosen LLM providers by default. Users can choose to provide their own LLM models and model providers. Many models can be provided simultaneously and then they will be used iteratively for each of the clusters.

adata = annotator.run(
    study_context="Human PBMC from COVID-19 patients",
    model_config=[{
        'provider': 'openai',
        'name': 'gpt-4o-mini',
        'apiKey': 'your-api-key',
        'baseUrl': 'https://api.openai.com/v1',  # Optional
        'modelSettings': {                       # Optional
            'temperature': 0.0,
            'max_tokens': 4096
        }  
    }],
)

Rate Limiting

If you do not provide your own model providers then the CyteType API implements rate limiting for fair usage:

Annotation submissions: 5 requests per hour per IP
Result retrieval: 20 requests per minute per IP

If you exceed rate limits, the system will return appropriate error messages with retry timing information

Supported providers: openai, anthropic, google, xai, groq, mistral, openrouter

Advanced parameters

adata = annotator.run(
    ...
    run_config={
        'concurrentClusters': 3,        # Default: 3, Range: 2-10
        'maxAnnotationRevisions': 2,    # Default: 2, Range: 1-5
        'maxLLMRequests': 500           # Default: 500, Range: 50-2000
    },
    
    # Custom metadata for tracking
    metadata={
        'experiment_name': 'PBMC_COVID_Study',
        'run_label': 'baseline_analysis',
        'researcher': 'Dr. Smith',
        'batch': 'batch_001'
    },
    
    # API polling and timeout settings
    poll_interval_seconds=10,           # How often to check for results
    timeout_seconds=1200,               # Max wait time (20 minutes)
    
    # API configuration
    api_url="https://custom-api.com",   # Custom API endpoint
    auth_token="your-auth-token",       # Authentication token
    save_query=True                     # Save query to query.json
)

Run configuration

concurrentClusters (int, default=5, range=2-30): Maximum number of clusters to process simultaneously. Higher values may speed up processing but can cause rate limit errors from LLM API providers.
maxAnnotationRevisions (int, default=2, range=1-5): Maximum number of refinement iterations based on reviewer feedback. More revisions may improve annotation quality but increase processing time.
maxLLMRequests (int, default=500, range=50-2000): Maximum total number of LLM API calls allowed for the entire job. The job will be terminated if this limit is reached. Helps control costs and prevents runaway processes.

Additional Run Parameters

metadata (dict, optional): Custom metadata to send with the API request for tracking purposes. Can include experiment names, run labels, researcher information, or any other user-defined data. This metadata is sent to the API but not stored locally with results.
poll_interval_seconds (int, default=30): How frequently to check the API for job completion.
timeout_seconds (int, default=3600): Maximum time to wait for results before timing out.
api_url (str): Custom API endpoint URL for self-hosted deployments.
auth_token (str, optional): Bearer token for API authentication.
save_query (bool, default=True): Whether to save the API query to query.json for debugging.

Annotation Process

CyteType performs comprehensive cell type annotation through an automated pipeline:

Core Functionality

Automated Annotation: Identifies likely cell types for each cluster based on marker genes
Ontology Mapping: Maps identified cell types to Cell Ontology terms (e.g., CL_0000127)
Review & Justification: Analyzes supporting/conflicting markers and assesses confidence
Alternative Suggestions: Provides potential alternative annotations when applicable
Real-time Progress: Updates results incrementally as clusters are processed

Job Status and Progress

When you submit an annotation job, it progresses through several stages:

PENDING: Job is queued and waiting to start
PROCESSING: Job is actively running with incremental results available
COMPLETED: All clusters have been annotated successfully
FAILED: Processing encountered an error

During PROCESSING, you can:

Monitor progress through the report URL
View partial results for completed clusters
See biological context summaries
Track completion status

Result Format

Results include detailed annotations for each cluster:

# Access results after annotation using the helper method
results = annotator.get_results()

# Or access directly from the stored JSON string
import json
results = json.loads(adata.uns['cytetype_results']['result'])

# Each annotation includes:
for annotation in results['annotations']:
    print(f"Cluster: {annotation['clusterId']}")
    print(f"Cell Type: {annotation['annotation']}")
    print(f"Confidence: {annotation['confidence']}")
    print(f"Ontology Term: {annotation['ontologyTerm']}")
    print(f"Supporting Markers: {annotation['supportingMarkers']}")
    print(f"Justification: {annotation['justification']}")

Example Report

View a sample annotation report: CyteType Report

Development

Setup

git clone https://github.com/NygenAnalytics/cytetype.git
cd cytetype
uv sync --all-extras
uv run pip install -e .

Testing

uv run pytest              # Run tests
uv run ruff check .        # Linting
uv run ruff format .       # Formatting
uv run mypy .              # Type checking

License

Licensed under CC BY-NC-SA 4.0 - see LICENSE for details.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

parashardhapola

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.19.4

Apr 2, 2026

0.19.3

Mar 8, 2026

0.19.2

Mar 8, 2026

0.19.1

Mar 7, 2026

0.19.0

Mar 7, 2026

0.18.1

Mar 3, 2026

0.18.0

Mar 3, 2026

0.17.0

Feb 23, 2026

0.16.1

Feb 20, 2026

0.16.0

Feb 19, 2026

0.15.0

Feb 19, 2026

0.14.1

Feb 18, 2026

0.13.0

Jan 10, 2026

0.12.0

Nov 24, 2025

0.11.0

Nov 20, 2025

0.10.0

Sep 17, 2025

0.9.2

Aug 28, 2025

0.9.1

Aug 25, 2025

0.9.0

Aug 3, 2025

0.8.3

Jul 30, 2025

0.8.1

Jul 22, 2025

0.8.0

Jul 17, 2025

0.7.0

Jul 6, 2025

0.6.2

Jun 27, 2025

0.6.1

Jun 27, 2025

0.6.0

Jun 26, 2025

0.5.2

Jun 17, 2025

0.5.1

Jun 11, 2025

This version

0.5.0

Jun 10, 2025

0.4.0

Jun 9, 2025

0.3.5

May 30, 2025

0.3.4

May 30, 2025

0.3.3

May 30, 2025

0.3.2

May 30, 2025

0.3.1

May 28, 2025

0.3.0

May 28, 2025

0.2.1

May 26, 2025

0.2.0

May 25, 2025

0.1.4

May 6, 2025

0.1.3

Apr 28, 2025

0.1.2

Apr 28, 2025

0.1.1

Apr 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cytetype-0.5.0.tar.gz (24.5 kB view details)

Uploaded Jun 10, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cytetype-0.5.0-py3-none-any.whl (24.3 kB view details)

Uploaded Jun 10, 2025 Python 3

File details

Details for the file cytetype-0.5.0.tar.gz.

File metadata

Download URL: cytetype-0.5.0.tar.gz
Upload date: Jun 10, 2025
Size: 24.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for cytetype-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`fe6dddc7757badaad5c95efd29d5b45312029210494c50e800ae222a093fd84e`
MD5	`f8e7792bbd88c93d609bb8b308ad267b`
BLAKE2b-256	`5cc1b4d931690b6c0ce17016d070438897327a4b517c40a9db1729b10ddf0d4c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cytetype-0.5.0.tar.gz:

Publisher: publish.yml on NygenAnalytics/CyteType

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cytetype-0.5.0.tar.gz
- Subject digest: fe6dddc7757badaad5c95efd29d5b45312029210494c50e800ae222a093fd84e
- Sigstore transparency entry: 234204079
- Sigstore integration time: Jun 10, 2025
Source repository:
- Permalink: NygenAnalytics/CyteType@379091462eceb8fdc62c64d597f9ef15ff188b82
- Branch / Tag: refs/tags/0.5.0
- Owner: https://github.com/NygenAnalytics
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@379091462eceb8fdc62c64d597f9ef15ff188b82
- Trigger Event: release

File details

Details for the file cytetype-0.5.0-py3-none-any.whl.

File metadata

Download URL: cytetype-0.5.0-py3-none-any.whl
Upload date: Jun 10, 2025
Size: 24.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for cytetype-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5965d22057c1f0321d566928a76357c087ed61ec4133b5b8e39f3ec6fa9967c3`
MD5	`fd764d6233b1ad2dbecbe0942762f6e1`
BLAKE2b-256	`7a6df8e288ee35be92db1680e5cf3e29f1ccdb46be7124051b3a01e2606d0a73`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cytetype-0.5.0-py3-none-any.whl:

Publisher: publish.yml on NygenAnalytics/CyteType

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cytetype-0.5.0-py3-none-any.whl
- Subject digest: 5965d22057c1f0321d566928a76357c087ed61ec4133b5b8e39f3ec6fa9967c3
- Sigstore transparency entry: 234204084
- Sigstore integration time: Jun 10, 2025
Source repository:
- Permalink: NygenAnalytics/CyteType@379091462eceb8fdc62c64d597f9ef15ff188b82
- Branch / Tag: refs/tags/0.5.0
- Owner: https://github.com/NygenAnalytics
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@379091462eceb8fdc62c64d597f9ef15ff188b82
- Trigger Event: release

cytetype 0.5.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

CyteType

Quick Start

Installation

Usage

Required Preprocessing

Annotation

Configuration Options

Initialization Parameters

Submitting Annotation job

Custom LLM Configuration

Rate Limiting

Advanced parameters

Run configuration

Additional Run Parameters

Annotation Process

Core Functionality

Job Status and Progress

Result Format

Example Report

Development

Setup

Testing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance