Skip to main content

OncoCyrix: a modular Scanpy-based pipeline for single-sample 10x scRNA-seq cancer analysis

Project description

Single-Sample 10x scRNA-seq Pipeline (scpipeline)

A modular, production-ready Scanpy pipeline for processing and analyzing a single 10x Genomics single-cell RNA-seq sample. This project is optimized for human cancer datasets, but works for any 10x scRNA-seq run.

Key Capabilities

10x matrix ingestion (MTX + barcodes + features) Gene ID normalization (Ensembl → Symbol) QC filtering (mitochondrial %, UMI counts, genes/cell) Normalization, log1p, HVG selection PCA, UMAP, t-SNE embeddings Leiden clustering Cell type annotation (CellTypist) Cell-type marker discovery Multi-database enrichment (GO, KEGG, Reactome, WikiPathways)

🔗 Final biological summaries Cell Types → DEGs → Markers → Pathways

  1. Project Structure singlecell_pipeline/ │ ├── config_cli.py # CLI + global configuration ├── loader_10x.py # 10x feature–barcode loading ├── gene_names.py # Gene ID normalization logic ├── group_de.py # DE tests, UMAP per group, compositions ├── markers.py # Cell-type-specific marker detection ├── pathway_enrichment.py # Enrichr/gseapy enrichment + semantic dedup ├── summary_ct_deg.py # Summaries (DEGs → markers → pathways) ├── pipeline.py # High-level Scanpy orchestration └── main_single.py # Entry point: single-sample pipeline run

Version: v1.0 A clean, modular codebase designed for clinical/translational scRNA-seq workflows.

  1. Features in Detail ➤ 10x Data Loading

Auto-detects matrix.mtx[.gz], barcodes.tsv[.gz], features.tsv/genes.tsv

Handles sparse matrices efficiently

➤ Gene Name Normalization Detects Ensembl IDs Maps to HGNC gene symbols via mygene.info Ensures uniqueness and consistency of adata.var_names

➤ Quality Control & Filtering Calculates: pct_counts_mt n_genes_by_counts total_counts Filters: <200 or >6000 genes

15% mitochondrial reads Genes expressed in <3 cells

➤ Normalization & HVG Selection normalize_total log1p HVG selection (Seurat v3 flavor)

➤ Dimensionality Reduction PCA (50 components) UMAP t-SNE (for n_cells < 50k)

➤ Clustering Leiden clustering (resolution 0.5) Cluster-level visualizations included

➤ Cell Type Annotation Auto-detection from metadata OR CellTypist ML classifier fallback Generates UMAP/TSNE/pca plots colored by cell types

➤ Marker Gene Detection Global markers Per-cell-type markers Rank plots, heatmaps, dotplots

➤ Pathway Enrichment Databases supported via gseapy/Enrichr: GO Biological Process GO Molecular Function GO Cellular Component KEGG Reactome WikiPathways

Includes: Semantic deduplication (MiniLM + FAISS) Top pathway barplots Combined enrichment tables

➤ Integrated Summary Creates a comprehensive biological table linking: Cell Type → DEGs → Marker Genes → Pathways

  1. Usage Run the pipeline python main_single.py
    --single-10x-dir "/path/to/10x_folder"
    --single-sample-label TumorA
    --single-group-label LUNG_CANCER

All results are saved to: <10x_folder>/SC_RESULTS/

This includes: QC plots HVG tables Embeddings (UMAP/t-SNE) Clusters Cell types Marker gene tables Enrichment results Summary spreadsheets and text files

  1. Intended Use Cases Cancer single-cell analysis Tumor microenvironment decomposition Biomarker discovery Translational/preclinical studies ML based celltype prediction

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oncocyrix-1.0.0.tar.gz (21.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oncocyrix-1.0.0-py3-none-any.whl (26.4 kB view details)

Uploaded Python 3

File details

Details for the file oncocyrix-1.0.0.tar.gz.

File metadata

  • Download URL: oncocyrix-1.0.0.tar.gz
  • Upload date:
  • Size: 21.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for oncocyrix-1.0.0.tar.gz
Algorithm Hash digest
SHA256 dd44d7788ca5fe154bc4b64b73df4a4befed4c721c57fd0012687b2546a872bd
MD5 fadcb903f82876df0127fc1aa8258ac0
BLAKE2b-256 53d60c2d60817fdc36df26780e2a3df844e211ec37f24d9913b452e60047561c

See more details on using hashes here.

File details

Details for the file oncocyrix-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: oncocyrix-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 26.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for oncocyrix-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1556de51b8bacca792cc5c6c0e313ad5dd21feb226bf88e0e3b5f30c406d59a4
MD5 0a2629fd74a4713446b897505a726310
BLAKE2b-256 b96f9822564590df3f682d7e0357c2b2190ae089e773378182bdb257583fba6e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page