OncoCyrix: a modular Scanpy-based pipeline for single-sample 10x scRNA-seq cancer analysis
Project description
OncoCyrix
OncoCyrix is a modular, production-ready Scanpy pipeline for processing and analyzing a single 10x Genomics single-cell RNA-seq sample.
The pipeline is optimized for human cancer datasets, but works for any standard 10x scRNA-seq run.
Key Capabilities
- 10x Genomics matrix ingestion (MTX + barcodes + features)
- Gene ID normalization (Ensembl → HGNC symbols)
- Quality control filtering
- Mitochondrial percentage
- UMI counts
- Genes per cell
- Normalization and log1p transformation
- Highly variable gene (HVG) selection
- PCA, UMAP, and t-SNE embeddings
- Leiden clustering
- Cell type annotation using CellTypist
- Cell-type-specific marker discovery
- Pathway enrichment analysis
- GO (BP, MF, CC)
- KEGG
- Reactome
- WikiPathways
Final biological summary
Cell Types → DEGs → Marker Genes → Pathways
Project Structure
singlecell_pipeline/
├── config_cli.py # CLI and global configuration
├── loader_10x.py # 10x data loading
├── gene_names.py # Gene ID normalization
├── group_de.py # Differential expression analysis
├── markers.py # Marker gene detection
├── pathway_enrichment.py # Enrichment analysis and deduplication
├── summary_ct_deg.py # Integrated summaries
├── pipeline.py # Scanpy orchestration
└── main_single.py # Pipeline entry point
Features in Detail
1. 10x Data Loading
- Automatically detects:
matrix.mtx/matrix.mtx.gzbarcodes.tsv/barcodes.tsv.gzfeatures.tsvorgenes.tsv
- Efficient sparse matrix handling
2. Gene Name Normalization
- Detects Ensembl gene IDs
- Maps to HGNC gene symbols using mygene.info
- Ensures unique and consistent gene names
3. Quality Control & Filtering
- Computes:
pct_counts_mtn_genes_by_countstotal_counts
- Filters:
- <200 or >6000 genes per cell
-
15% mitochondrial reads
- Genes expressed in fewer than 3 cells
4. Normalization & HVG Selection
- Library size normalization
- Log1p transformation
- HVG selection (Seurat v3 flavor)
5. Dimensionality Reduction
- PCA (50 components)
- UMAP
- t-SNE (enabled for datasets with fewer than 50k cells)
6. Clustering
- Leiden clustering (default resolution = 0.5)
- Cluster-level visualizations
7. Cell Type Annotation
- Metadata-based annotation or
- Machine-learning-based prediction using CellTypist
- PCA / UMAP / t-SNE plots colored by cell type
8. Marker Gene Detection
- Global marker genes
- Cell-type-specific markers
- Rank plots, heatmaps, and dotplots
9. Pathway Enrichment
- Enrichment via gseapy / Enrichr
- Supported databases:
- GO Biological Process
- GO Molecular Function
- GO Cellular Component
- KEGG
- Reactome
- WikiPathways
- Semantic deduplication using MiniLM + FAISS
10. Integrated Biological Summary
Automatically links:
- Cell types
- DEGs
- Marker genes
- Enriched pathways
Usage
Run the pipeline on a single 10x dataset:
scpipeline --single-10x-dir "/path/to/10x_folder" --single-sample-label TumorA --single-group-label LUNG_CANCER
All results are saved to:
<10x_folder>/SC_RESULTS/
Outputs Generated
- Quality control plots
- Highly variable gene tables
- PCA / UMAP / t-SNE embeddings
- Clustering results
- Cell type annotations
- Marker gene tables
- Pathway enrichment results
- Integrated summary tables
Docker Usage
Docker image available on Docker Hub:
docker pull sheryar09/scpipeline:latest
Run:
docker run --rm -v /path/to/10x_folder:/data sheryar09/scpipeline:latest --single-10x-dir /data --single-sample-label TumorA --single-group-label LUNG_CANCER
Intended Use Cases
- Cancer single-cell RNA-seq analysis
- Tumor microenvironment profiling
- Biomarker discovery
- Translational and preclinical studies
- ML-based cell type prediction
Version: 1.0
Author: Sheryar Malik
Project Name: OncoCyrix
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file oncocyrix-1.0.1.tar.gz.
File metadata
- Download URL: oncocyrix-1.0.1.tar.gz
- Upload date:
- Size: 23.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0ef611b539441865af7fda48c709411f1a663c9dd303ea092e869f6c668299b2
|
|
| MD5 |
1bd95b4e3adcf563181729ca827cc089
|
|
| BLAKE2b-256 |
5cf47aaa2b9f024cfe511e994034766864538f34762ec554ea2a7c9089f44848
|
File details
Details for the file oncocyrix-1.0.1-py3-none-any.whl.
File metadata
- Download URL: oncocyrix-1.0.1-py3-none-any.whl
- Upload date:
- Size: 26.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf84e1ce687555f13d7be5e64334b3fba24b51a6a84793ac07a97726de65d419
|
|
| MD5 |
9da5710a2bc6f01d0a4f23cfda0c657a
|
|
| BLAKE2b-256 |
1817489997d4bf3944bf1baa59c4d254dd0a82856dbd84d2636965e4ecd337f0
|