Skip to main content

No project description provided

Project description

Similarity Metrics at High Dimensionality - testing for rare cell types

Docs

Documentation and reproducibility are available at:

https://ebony-watson.github.io/scProximitE

Install

pip install scproximite

Note: scproximite was developed using Python 3.8, of you have any issues we recommend using conda and creating a new environment before installing:

conda create --name scproximite python=3.8
conda activate scproximite
pip install scproximite

Run tutorials

  1. Get tutorial data from zeonodo: https://zenodo.org/record/6443267 (DOI: 10.5281/zenodo.6443266)
  2. Add to the data/framework folder
  3. Run jupyter notebook in the tutorials folder

You should now be able to run the tutorial notebooks. Note if you don't have R installed you won't be able to run the notebook that uses R metrics: Proximity_Metrics_R.ipynb.

Datasets

Cellsius

A benchmark dataset of ~ 12,000 single-cell transcriptomes from eight human cell lines. The eight human cell lines were individually profiled by bulk RNA-seq, and mixed in four batches containing mixtures of two or three cell lines each for scRNA-seq profiling.

Batch1: IMR90 and HCT116 (50/50)
  • IMR90 is a fibroblast cell line, isolated from fetal lung. Female.
  • HCT116 is from human colon carcinoma with epithelium-like morphology. Male.
Batch2: A549 and Ramos (50/50)
  • A549 is from human lung carcinoma, cell type is epithelial. Male.
  • Ramos cells are from Burkitt’s lymphoma. They are lymphoblasts with B-cell characteristics. Ramos cells are very small (7-10um), so we usually find that they have fewer detected features and lower total count than other cell lines. Male.
Batch3: HEK293 and H1437 (50/50)
  • HEK293 is a cell line form human embryonic kidney cells. Female.
  • H1437 is from lung adenocarcinoma (i.e. origin is epithelial / glandular). Male.
DA234 (Batch 4): Jurkat, K562, Ramos (40% Jurkat, 55% K562 and 5% Ramos)
  • Jurkat is a T-cell lymphoblast cell line. Male.
  • K562 is a lymphoblast cell line wih granulocyte/erythrocyte/monocyte characteristics (fairly undifferentiated). Female.

Cell-type annotation:

Correlation of the single-cell to bulk expression profiles was used for cell type assignment, & Single cells were assigned to the cell type correlating most with their expression profile. Cells were excluded if their z-score correlation < 0.2, or if they correlated strongly with more than one bulk expression profile (likely doublets).

Subsets

Cell-type Complete Subset 1 Subset 2
HCT116 1743 1400 1600
HEK293 2002 1600 2000
IMR90 1039 500 100
A549 1320 400 80
Ramos 1892 350 125
H1437 1116 270 3
K562 1606 380 70
Jurkat 962 100 6

Datasets are pre-annotated with cell_idx, Batch, cell_line, cell_cycle_phase, gene names etc. and a range of QC metrics (would not necessarily trust). Data is downloaded as an R data object, and were subsequently processed in R. Then convereted from seurat to anndata object using SCEasy.

Final datasets are located in RDM under code/DimensionalityReduction_Aim2/data/Cellsius/:
  • Cellsius_Complete_Raw(sceasy).h5ad (Full dataset of all 8 cell lines, only pre-cursor filtering)
  • Cellsius_Subset1_Raw(sceasy).h5ad (Subset 1 dataset of all 8 cell lines, only pre-cursor filtering)
  • Cellsius_Subset2_Raw(sceasy).h5ad (Subset 2 dataset of all 8 cell lines, only pre-cursor filtering)
  • subset1_sce_cleaned(SCEeasy).h5ad (Subset 1 dataset of all 8 cell lines, pre-cursor + some additional filtering)
  • subset2_sce_cleaned(sceasy).h5ad (Subset 2 dataset of all 8 cell lines, pre-cursor + some additional filtering)

None of the datasets have been normalised/transformed/scaled

Filtering:

Precursor (done by authors prior to uploading data publically):

  • ≥ 10.5 genes per cell [log2]
  • ≥ 12.0 total UMIs / cell [log2]
  • ≥ 10% mitochondrial genes

Additional:

  • Outliers
  • ≥ 3 counts in at least 1 cell

Sourced from:https://zenodo.org/record/3238275#.YWYVKBx_VhE

Paper: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1739-7

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scproximite-0.0.1.tar.gz (25.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scproximite-0.0.1-py3-none-any.whl (36.4 kB view details)

Uploaded Python 3

File details

Details for the file scproximite-0.0.1.tar.gz.

File metadata

  • Download URL: scproximite-0.0.1.tar.gz
  • Upload date:
  • Size: 25.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/61.2.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.2

File hashes

Hashes for scproximite-0.0.1.tar.gz
Algorithm Hash digest
SHA256 6e1d2d2c24b4462b6283d27d6f27cb300c928523bcc5633ede02c27123ea19b0
MD5 57318429b64f6b80e52ded263e6642d1
BLAKE2b-256 3139f43141e3c619ac8082554a3b66abb239b2bb8f9b5ffc70782e39b64848e0

See more details on using hashes here.

File details

Details for the file scproximite-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: scproximite-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 36.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/61.2.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.2

File hashes

Hashes for scproximite-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 aa0af2b6f21afb5e3c56d402cddbdb783fa3f54e2c6179c6615d32121b2a4712
MD5 68b113f688a2ec7dd7ed357076649187
BLAKE2b-256 cc7085fb1c48cbe9455109f0a90cd733524e78c18414a4c5e5edb26675597e8e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page