No project description provided

These details have not been verified by PyPI

Project links

Project description

Similarity Metrics at High Dimensionality - testing for rare cell types

Docs

Documentation and reproducibility are available at:

https://ebony-watson.github.io/scProximitE

Install

pip install scproximite

Note: scproximite was developed using Python 3.8, of you have any issues we recommend using conda and creating a new environment before installing:

conda create --name scproximite python=3.8

conda activate scproximite

pip install scproximite

Run tutorials

Get tutorial data from zeonodo: https://zenodo.org/record/6443267 (DOI: 10.5281/zenodo.6443266)
Add to the data/framework folder
Run jupyter notebook in the tutorials folder

You should now be able to run the tutorial notebooks. Note if you don't have R installed you won't be able to run the notebook that uses R metrics: Proximity_Metrics_R.ipynb.

Datasets

Cellsius

A benchmark dataset of ~â€‰12,000 single-cell transcriptomes from eight human cell lines. The eight human cell lines were individually profiled by bulk RNA-seq, and mixed in four batches containing mixtures of two or three cell lines each for scRNA-seq profiling.

Batch1: IMR90 and HCT116 (50/50)

IMR90 is a fibroblast cell line, isolated from fetal lung. Female.
HCT116 is from human colon carcinoma with epithelium-like morphology. Male.

Batch2: A549 and Ramos (50/50)

A549 is from human lung carcinoma, cell type is epithelial. Male.
Ramos cells are from Burkittâ€™s lymphoma. They are lymphoblasts with B-cell characteristics. Ramos cells are very small (7-10um), so we usually find that they have fewer detected features and lower total count than other cell lines. Male.

Batch3: HEK293 and H1437 (50/50)

HEK293 is a cell line form human embryonic kidney cells. Female.
H1437 is from lung adenocarcinoma (i.e. origin is epithelial / glandular). Male.

DA234 (Batch 4): Jurkat, K562, Ramos (40% Jurkat, 55% K562 and 5% Ramos)

Jurkat is a T-cell lymphoblast cell line. Male.
K562 is a lymphoblast cell line wih granulocyte/erythrocyte/monocyte characteristics (fairly undifferentiated). Female.

Cell-type annotation:

Correlation of the single-cell to bulk expression profiles was used for cell type assignment, & Single cells were assigned to the cell type correlating most with their expression profile. Cells were excluded if their z-score correlation < 0.2, or if they correlated strongly with more than one bulk expression profile (likely doublets).

Subsets

Cell-type	Complete	Subset 1	Subset 2
HCT116	1743	1400	1600
HEK293	2002	1600	2000
IMR90	1039	500	100
A549	1320	400	80
Ramos	1892	350	125
H1437	1116	270	3
K562	1606	380	70
Jurkat	962	100	6

Datasets are pre-annotated with cell_idx, Batch, cell_line, cell_cycle_phase, gene names etc. and a range of QC metrics (would not necessarily trust). Data is downloaded as an R data object, and were subsequently processed in R. Then convereted from seurat to anndata object using SCEasy.

Final datasets are located in RDM under code/DimensionalityReduction_Aim2/data/Cellsius/:

Cellsius_Complete_Raw(sceasy).h5ad (Full dataset of all 8 cell lines, only pre-cursor filtering)
Cellsius_Subset1_Raw(sceasy).h5ad (Subset 1 dataset of all 8 cell lines, only pre-cursor filtering)
Cellsius_Subset2_Raw(sceasy).h5ad (Subset 2 dataset of all 8 cell lines, only pre-cursor filtering)
subset1_sce_cleaned(SCEeasy).h5ad (Subset 1 dataset of all 8 cell lines, pre-cursor + some additional filtering)
subset2_sce_cleaned(sceasy).h5ad (Subset 2 dataset of all 8 cell lines, pre-cursor + some additional filtering)

None of the datasets have been normalised/transformed/scaled

Filtering:

Precursor (done by authors prior to uploading data publically):

â‰¥ 10.5 genes per cell [log2]
â‰¥ 12.0 total UMIs / cell [log2]
â‰¥ 10% mitochondrial genes

Additional:

Outliers
â‰¥ 3 counts in at least 1 cell

Sourced from:https://zenodo.org/record/3238275#.YWYVKBx_VhE

Paper: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1739-7

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.1

Apr 14, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scproximite-0.0.1.tar.gz (25.2 kB view details)

Uploaded Apr 14, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scproximite-0.0.1-py3-none-any.whl (36.4 kB view details)

Uploaded Apr 14, 2022 Python 3

File details

Details for the file scproximite-0.0.1.tar.gz.

File metadata

Download URL: scproximite-0.0.1.tar.gz
Upload date: Apr 14, 2022
Size: 25.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/61.2.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.2

File hashes

Hashes for scproximite-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`6e1d2d2c24b4462b6283d27d6f27cb300c928523bcc5633ede02c27123ea19b0`
MD5	`57318429b64f6b80e52ded263e6642d1`
BLAKE2b-256	`3139f43141e3c619ac8082554a3b66abb239b2bb8f9b5ffc70782e39b64848e0`

See more details on using hashes here.

File details

Details for the file scproximite-0.0.1-py3-none-any.whl.

File metadata

Download URL: scproximite-0.0.1-py3-none-any.whl
Upload date: Apr 14, 2022
Size: 36.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/61.2.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.2

File hashes

Hashes for scproximite-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`aa0af2b6f21afb5e3c56d402cddbdb783fa3f54e2c6179c6615d32121b2a4712`
MD5	`68b113f688a2ec7dd7ed357076649187`
BLAKE2b-256	`cc7085fb1c48cbe9455109f0a90cd733524e78c18414a4c5e5edb26675597e8e`

See more details on using hashes here.

scproximite 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Similarity Metrics at High Dimensionality - testing for rare cell types

Docs

Install

Run tutorials

Datasets

Cellsius

Batch1: IMR90 and HCT116 (50/50)

Batch2: A549 and Ramos (50/50)

Batch3: HEK293 and H1437 (50/50)

DA234 (Batch 4): Jurkat, K562, Ramos (40% Jurkat, 55% K562 and 5% Ramos)

Cell-type annotation:

Subsets

Final datasets are located in RDM under code/DimensionalityReduction_Aim2/data/Cellsius/:

Filtering:

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes