Skip to main content

OncoCyrix: a modular Scanpy-based pipeline for single-sample 10x scRNA-seq cancer analysis

Project description

OncoCyrix

OncoCyrix is a modular, production-ready Scanpy pipeline for processing and analyzing a single 10x Genomics single-cell RNA-seq sample.

The pipeline is optimized for human cancer datasets, but works for any standard 10x scRNA-seq run.


Key Capabilities

  • 10x Genomics matrix ingestion (MTX + barcodes + features)
  • Gene ID normalization (Ensembl → HGNC symbols)
  • Quality control filtering
    • Mitochondrial percentage
    • UMI counts
    • Genes per cell
  • Normalization and log1p transformation
  • Highly variable gene (HVG) selection
  • PCA, UMAP, and t-SNE embeddings
  • Leiden clustering
  • Cell type annotation using CellTypist
  • Cell-type-specific marker discovery
  • Pathway enrichment analysis
    • GO (BP, MF, CC)
    • KEGG
    • Reactome
    • WikiPathways

Final biological summary

Cell Types → DEGs → Marker Genes → Pathways


Project Structure

singlecell_pipeline/
├── config_cli.py          # CLI and global configuration
├── loader_10x.py          # 10x data loading
├── gene_names.py          # Gene ID normalization
├── group_de.py            # Differential expression analysis
├── markers.py             # Marker gene detection
├── pathway_enrichment.py  # Enrichment analysis and deduplication
├── summary_ct_deg.py      # Integrated summaries
├── pipeline.py            # Scanpy orchestration
└── main_single.py         # Pipeline entry point

Features in Detail

1. 10x Data Loading

  • Automatically detects:
    • matrix.mtx / matrix.mtx.gz
    • barcodes.tsv / barcodes.tsv.gz
    • features.tsv or genes.tsv
  • Efficient sparse matrix handling

2. Gene Name Normalization

  • Detects Ensembl gene IDs
  • Maps to HGNC gene symbols using mygene.info
  • Ensures unique and consistent gene names

3. Quality Control & Filtering

  • Computes:
    • pct_counts_mt
    • n_genes_by_counts
    • total_counts
  • Filters:
    • <200 or >6000 genes per cell
    • 15% mitochondrial reads

    • Genes expressed in fewer than 3 cells

4. Normalization & HVG Selection

  • Library size normalization
  • Log1p transformation
  • HVG selection (Seurat v3 flavor)

5. Dimensionality Reduction

  • PCA (50 components)
  • UMAP
  • t-SNE (enabled for datasets with fewer than 50k cells)

6. Clustering

  • Leiden clustering (default resolution = 0.5)
  • Cluster-level visualizations

7. Cell Type Annotation

  • Metadata-based annotation or
  • Machine-learning-based prediction using CellTypist
  • PCA / UMAP / t-SNE plots colored by cell type

8. Marker Gene Detection

  • Global marker genes
  • Cell-type-specific markers
  • Rank plots, heatmaps, and dotplots

9. Pathway Enrichment

  • Enrichment via gseapy / Enrichr
  • Supported databases:
    • GO Biological Process
    • GO Molecular Function
    • GO Cellular Component
    • KEGG
    • Reactome
    • WikiPathways
  • Semantic deduplication using MiniLM + FAISS

10. Integrated Biological Summary

Automatically links:

  • Cell types
  • DEGs
  • Marker genes
  • Enriched pathways

Usage

Run the pipeline on a single 10x dataset:

scpipeline   --single-10x-dir "/path/to/10x_folder"   --single-sample-label TumorA   --single-group-label LUNG_CANCER

All results are saved to:

<10x_folder>/SC_RESULTS/

Outputs Generated

  • Quality control plots
  • Highly variable gene tables
  • PCA / UMAP / t-SNE embeddings
  • Clustering results
  • Cell type annotations
  • Marker gene tables
  • Pathway enrichment results
  • Integrated summary tables

Docker Usage

Docker image available on Docker Hub:

docker pull sheryar09/scpipeline:latest

Run:

docker run --rm   -v /path/to/10x_folder:/data   sheryar09/scpipeline:latest   --single-10x-dir /data   --single-sample-label TumorA   --single-group-label LUNG_CANCER

Intended Use Cases

  • Cancer single-cell RNA-seq analysis
  • Tumor microenvironment profiling
  • Biomarker discovery
  • Translational and preclinical studies
  • ML-based cell type prediction

Version: 1.0
Author: Sheryar Malik
Project Name: OncoCyrix

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oncocyrix-1.0.1.tar.gz (23.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oncocyrix-1.0.1-py3-none-any.whl (26.6 kB view details)

Uploaded Python 3

File details

Details for the file oncocyrix-1.0.1.tar.gz.

File metadata

  • Download URL: oncocyrix-1.0.1.tar.gz
  • Upload date:
  • Size: 23.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for oncocyrix-1.0.1.tar.gz
Algorithm Hash digest
SHA256 0ef611b539441865af7fda48c709411f1a663c9dd303ea092e869f6c668299b2
MD5 1bd95b4e3adcf563181729ca827cc089
BLAKE2b-256 5cf47aaa2b9f024cfe511e994034766864538f34762ec554ea2a7c9089f44848

See more details on using hashes here.

File details

Details for the file oncocyrix-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: oncocyrix-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 26.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for oncocyrix-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cf84e1ce687555f13d7be5e64334b3fba24b51a6a84793ac07a97726de65d419
MD5 9da5710a2bc6f01d0a4f23cfda0c657a
BLAKE2b-256 1817489997d4bf3944bf1baa59c4d254dd0a82856dbd84d2636965e4ecd337f0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page