No project description provided
Project description
Similarity Metrics at High Dimensionality - testing for rare cell types
Docs
Documentation and reproducibility are available at:
https://ebony-watson.github.io/scProximitE
Install
pip install scproximite
Note: scproximite was developed using Python 3.8, of you have any issues we recommend using conda and creating a new environment before installing:
conda create --name scproximite python=3.8
conda activate scproximite
pip install scproximite
Run tutorials
- Get tutorial data from zeonodo: https://zenodo.org/record/6443267 (DOI: 10.5281/zenodo.6443266)
- Add to the
data/frameworkfolder - Run
jupyter notebookin thetutorialsfolder
You should now be able to run the tutorial notebooks. Note if you don't have R installed you won't be able to
run the notebook that uses R metrics: Proximity_Metrics_R.ipynb.
Datasets
Cellsius
A benchmark dataset of ~ 12,000 single-cell transcriptomes from eight human cell lines. The eight human cell lines were individually profiled by bulk RNA-seq, and mixed in four batches containing mixtures of two or three cell lines each for scRNA-seq profiling.
Batch1: IMR90 and HCT116 (50/50)
- IMR90 is a fibroblast cell line, isolated from fetal lung. Female.
- HCT116 is from human colon carcinoma with epithelium-like morphology. Male.
Batch2: A549 and Ramos (50/50)
- A549 is from human lung carcinoma, cell type is epithelial. Male.
- Ramos cells are from Burkitt’s lymphoma. They are lymphoblasts with B-cell characteristics. Ramos cells are very small (7-10um), so we usually find that they have fewer detected features and lower total count than other cell lines. Male.
Batch3: HEK293 and H1437 (50/50)
- HEK293 is a cell line form human embryonic kidney cells. Female.
- H1437 is from lung adenocarcinoma (i.e. origin is epithelial / glandular). Male.
DA234 (Batch 4): Jurkat, K562, Ramos (40% Jurkat, 55% K562 and 5% Ramos)
- Jurkat is a T-cell lymphoblast cell line. Male.
- K562 is a lymphoblast cell line wih granulocyte/erythrocyte/monocyte characteristics (fairly undifferentiated). Female.
Cell-type annotation:
Correlation of the single-cell to bulk expression profiles was used for cell type assignment, & Single cells were assigned to the cell type correlating most with their expression profile. Cells were excluded if their z-score correlation < 0.2, or if they correlated strongly with more than one bulk expression profile (likely doublets).
Subsets
| Cell-type | Complete | Subset 1 | Subset 2 |
|---|---|---|---|
| HCT116 | 1743 | 1400 | 1600 |
| HEK293 | 2002 | 1600 | 2000 |
| IMR90 | 1039 | 500 | 100 |
| A549 | 1320 | 400 | 80 |
| Ramos | 1892 | 350 | 125 |
| H1437 | 1116 | 270 | 3 |
| K562 | 1606 | 380 | 70 |
| Jurkat | 962 | 100 | 6 |
Datasets are pre-annotated with cell_idx, Batch, cell_line, cell_cycle_phase, gene names etc. and a range of QC metrics (would not necessarily trust). Data is downloaded as an R data object, and were subsequently processed in R. Then convereted from seurat to anndata object using SCEasy.
Final datasets are located in RDM under code/DimensionalityReduction_Aim2/data/Cellsius/:
- Cellsius_Complete_Raw(sceasy).h5ad (Full dataset of all 8 cell lines, only pre-cursor filtering)
- Cellsius_Subset1_Raw(sceasy).h5ad (Subset 1 dataset of all 8 cell lines, only pre-cursor filtering)
- Cellsius_Subset2_Raw(sceasy).h5ad (Subset 2 dataset of all 8 cell lines, only pre-cursor filtering)
- subset1_sce_cleaned(SCEeasy).h5ad (Subset 1 dataset of all 8 cell lines, pre-cursor + some additional filtering)
- subset2_sce_cleaned(sceasy).h5ad (Subset 2 dataset of all 8 cell lines, pre-cursor + some additional filtering)
None of the datasets have been normalised/transformed/scaled
Filtering:
Precursor (done by authors prior to uploading data publically):
- ≥ 10.5 genes per cell [log2]
- ≥ 12.0 total UMIs / cell [log2]
- ≥ 10% mitochondrial genes
Additional:
- Outliers
- ≥ 3 counts in at least 1 cell
Sourced from:https://zenodo.org/record/3238275#.YWYVKBx_VhE
Paper: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1739-7
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scproximite-0.0.1.tar.gz.
File metadata
- Download URL: scproximite-0.0.1.tar.gz
- Upload date:
- Size: 25.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/61.2.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e1d2d2c24b4462b6283d27d6f27cb300c928523bcc5633ede02c27123ea19b0
|
|
| MD5 |
57318429b64f6b80e52ded263e6642d1
|
|
| BLAKE2b-256 |
3139f43141e3c619ac8082554a3b66abb239b2bb8f9b5ffc70782e39b64848e0
|
File details
Details for the file scproximite-0.0.1-py3-none-any.whl.
File metadata
- Download URL: scproximite-0.0.1-py3-none-any.whl
- Upload date:
- Size: 36.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/61.2.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aa0af2b6f21afb5e3c56d402cddbdb783fa3f54e2c6179c6615d32121b2a4712
|
|
| MD5 |
68b113f688a2ec7dd7ed357076649187
|
|
| BLAKE2b-256 |
cc7085fb1c48cbe9455109f0a90cd733524e78c18414a4c5e5edb26675597e8e
|