OncoCyrix: a modular Scanpy-based pipeline for single-sample 10x scRNA-seq cancer analysis
Project description
Single-Sample 10x scRNA-seq Pipeline (scpipeline)
A modular, production-ready Scanpy pipeline for processing and analyzing a single 10x Genomics single-cell RNA-seq sample. This project is optimized for human cancer datasets, but works for any 10x scRNA-seq run.
Key Capabilities
10x matrix ingestion (MTX + barcodes + features) Gene ID normalization (Ensembl → Symbol) QC filtering (mitochondrial %, UMI counts, genes/cell) Normalization, log1p, HVG selection PCA, UMAP, t-SNE embeddings Leiden clustering Cell type annotation (CellTypist) Cell-type marker discovery Multi-database enrichment (GO, KEGG, Reactome, WikiPathways)
🔗 Final biological summaries Cell Types → DEGs → Markers → Pathways
- Project Structure singlecell_pipeline/ │ ├── config_cli.py # CLI + global configuration ├── loader_10x.py # 10x feature–barcode loading ├── gene_names.py # Gene ID normalization logic ├── group_de.py # DE tests, UMAP per group, compositions ├── markers.py # Cell-type-specific marker detection ├── pathway_enrichment.py # Enrichr/gseapy enrichment + semantic dedup ├── summary_ct_deg.py # Summaries (DEGs → markers → pathways) ├── pipeline.py # High-level Scanpy orchestration └── main_single.py # Entry point: single-sample pipeline run
Version: v1.0 A clean, modular codebase designed for clinical/translational scRNA-seq workflows.
- Features in Detail ➤ 10x Data Loading
Auto-detects matrix.mtx[.gz], barcodes.tsv[.gz], features.tsv/genes.tsv
Handles sparse matrices efficiently
➤ Gene Name Normalization Detects Ensembl IDs Maps to HGNC gene symbols via mygene.info Ensures uniqueness and consistency of adata.var_names
➤ Quality Control & Filtering Calculates: pct_counts_mt n_genes_by_counts total_counts Filters: <200 or >6000 genes
15% mitochondrial reads Genes expressed in <3 cells
➤ Normalization & HVG Selection normalize_total log1p HVG selection (Seurat v3 flavor)
➤ Dimensionality Reduction PCA (50 components) UMAP t-SNE (for n_cells < 50k)
➤ Clustering Leiden clustering (resolution 0.5) Cluster-level visualizations included
➤ Cell Type Annotation Auto-detection from metadata OR CellTypist ML classifier fallback Generates UMAP/TSNE/pca plots colored by cell types
➤ Marker Gene Detection Global markers Per-cell-type markers Rank plots, heatmaps, dotplots
➤ Pathway Enrichment Databases supported via gseapy/Enrichr: GO Biological Process GO Molecular Function GO Cellular Component KEGG Reactome WikiPathways
Includes: Semantic deduplication (MiniLM + FAISS) Top pathway barplots Combined enrichment tables
➤ Integrated Summary Creates a comprehensive biological table linking: Cell Type → DEGs → Marker Genes → Pathways
- Usage
Run the pipeline
python main_single.py
--single-10x-dir "/path/to/10x_folder"
--single-sample-label TumorA
--single-group-label LUNG_CANCER
All results are saved to: <10x_folder>/SC_RESULTS/
This includes: QC plots HVG tables Embeddings (UMAP/t-SNE) Clusters Cell types Marker gene tables Enrichment results Summary spreadsheets and text files
- Intended Use Cases Cancer single-cell analysis Tumor microenvironment decomposition Biomarker discovery Translational/preclinical studies ML based celltype prediction
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file oncocyrix-1.0.0.tar.gz.
File metadata
- Download URL: oncocyrix-1.0.0.tar.gz
- Upload date:
- Size: 21.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dd44d7788ca5fe154bc4b64b73df4a4befed4c721c57fd0012687b2546a872bd
|
|
| MD5 |
fadcb903f82876df0127fc1aa8258ac0
|
|
| BLAKE2b-256 |
53d60c2d60817fdc36df26780e2a3df844e211ec37f24d9913b452e60047561c
|
File details
Details for the file oncocyrix-1.0.0-py3-none-any.whl.
File metadata
- Download URL: oncocyrix-1.0.0-py3-none-any.whl
- Upload date:
- Size: 26.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1556de51b8bacca792cc5c6c0e313ad5dd21feb226bf88e0e3b5f30c406d59a4
|
|
| MD5 |
0a2629fd74a4713446b897505a726310
|
|
| BLAKE2b-256 |
b96f9822564590df3f682d7e0357c2b2190ae089e773378182bdb257583fba6e
|