Skip to main content

Single-cell RNA-seq multicohort analysis pipeline with Scanpy (10x-only, CAR-T aware, structured outputs)

Project description

Oncocyrix Multicohort

Oncocyrix Multicohort is a production-grade, Scanpy-based single-cell RNA-seq analysis pipeline designed for 10x Genomics data.

It supports single-sample and multi-cohort studies, with optional CAR-T–aware analysis, cell-type annotation, pseudobulk DESeq2, and multi-database pathway enrichment.

The pipeline is built for research reproducibility, structured outputs, and large-scale cohort integration, including pre/post, multi-patient, and multi-condition study designs.


Overview

This repository provides an end-to-end single-cell RNA-seq workflow that combines:

  • Standard scRNA-seq best practices (QC, clustering, DE)
  • Cohort-aware integration and comparison
  • Bulk-grade statistics via pseudobulk aggregation
  • Immuno-oncology–focused CAR-T state modeling

It is suitable for both exploratory single-cell analysis and translational / clinical research pipelines.


Key Features

Core scRNA-seq Analysis

  • 10x Genomics loader (raw mtx/tsv/gz)
  • Robust QC & filtering (mitochondrial %, gene counts)
  • Normalization, log1p, HVG selection
  • Dimensionality reduction (PCA, UMAP, optional t-SNE)
  • Clustering & trajectory inference (Leiden, DPT)

Multi-Cohort & Integration

  • Single-sample and multi-cohort modes
  • Batch correction & integration (BBKNN)
  • Group-wise comparisons (e.g., Pre vs Post, Tumor vs Normal)

Cell-Type Annotation

  • Metadata-driven annotation (if provided)
  • ML-based annotation via CellTypist (optional)
  • Cell-type–specific and cluster-specific marker discovery

Differential Expression

  • Single-cell DE (Scanpy, Wilcoxon)
  • Group-wise DE within clusters or cell types
  • Pseudobulk aggregation
  • DESeq2 via rpy2 for bulk-grade inference

Pathway Enrichment

  • Multi-database enrichment:
    • GO BP / MF / CC
    • KEGG
    • Reactome
    • WikiPathways
  • Publication-ready plots
  • Semantic pathway deduplication (MiniLM + FAISS, optional)

CAR-T Analysis (Optional)

When enabled, the pipeline becomes CAR-T aware, providing biologically interpretable functional state modeling.

Enabling CAR-T

CAR-T analysis is controlled in code, not via CLI:

DO_CART_SCORING = True

Supported CAR-T States

  • TStemCM_like
  • TPEX_like
  • TEX_terminal
  • Effector_TEFF
  • Proliferating_T
  • Terminal_diff

CAR-T Outputs

  • CAR-T gene signature scoring per cell
  • Refined CAR-T state classification
  • CAR-T UMAPs and score visualizations
  • Patient × phase × state × gene summaries
  • Pre/Post delta tables
  • CAR-T state marker genes

use pip install oncocyrix_multicohort --mode multi --multi-base-dir "folder location where 10x multi samples are avb"

Installation

Core installation

pip install .

Full installation (recommended)

pip install ".[all]"

R dependencies (for DESeq2)

Requires R ≥ 4.0 with:

install.packages(c("DESeq2", "ggplot2", "pheatmap"))

Package Structure

oncocyrix-multicohort/
├── pyproject.toml
├── README.md
└── oncocyrix_multicohort/
    ├── __init__.py
    └── pipeline.py

CLI entry point

oncocyrix-multicohort → oncocyrix_multicohort.pipeline:main

Usage

Single-sample mode

oncocyrix-multicohort \
  --mode single \
  --single-10x-dir /path/to/10x/sample \
  --out-name SC_ANALYSIS_RESULTS

Multi-cohort mode

oncocyrix-multicohort \
  --mode multi \
  --multi-base-dir /path/to/GSE208653_RAW \
  --out-name SC_ANALYSIS_RESULTS

See cli.md for full CLI usage and metadata requirements.


Output Structure

SC_ANALYSIS_RESULTS/
├── 00_analysis_summary
├── 01_qc_and_filtering
├── 02_highly_variable_genes
├── 03_dimensionality_reduction_and_embeddings
├── 04_clustering_and_cell_states
├── 05_celltype_analysis
├── 05_CART_analysis
├── 06_groupwise_deg
├── 07_pathway_enrichment
├── 08_pseudobulk
├── 09_reference_summary
└── *.h5ad

Requirements

  • Python ≥ 3.9 (tested up to 3.12)
  • R ≥ 4.0 (for DESeq2)
  • Recommended RAM ≥ 32 GB

Citation

Malik S.
Oncocyrix Multicohort: A CAR-T–aware single-cell RNA-seq analysis framework.


Author

Sheryar Malik
Bioinformatics Scientist


License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oncocyrix_multicohort-0.1.0.tar.gz (42.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oncocyrix_multicohort-0.1.0-py3-none-any.whl (39.8 kB view details)

Uploaded Python 3

File details

Details for the file oncocyrix_multicohort-0.1.0.tar.gz.

File metadata

  • Download URL: oncocyrix_multicohort-0.1.0.tar.gz
  • Upload date:
  • Size: 42.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for oncocyrix_multicohort-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5388eea7ce959b10a27cfb5df8a234651bb18e933a4c5abba539be051022d9de
MD5 c28e2c4ac183ac5e277e27d4932a7c2f
BLAKE2b-256 e625fef74ec455c208c14eb5bdf075a28179f61e2f4315ffda453c44b211deaa

See more details on using hashes here.

File details

Details for the file oncocyrix_multicohort-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for oncocyrix_multicohort-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 378c6d4500b415316beb48d9f14b1623cc285f65c6c0b0d5396284d7a6973e2e
MD5 98d4a188823ae057b03c1fe5e234b942
BLAKE2b-256 19a962a1e4e5e5593dbfc900204251a80f7a9500ea8fcf2b1b265af69db53c42

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page