Single-cell RNA-seq data visualization
Project description
SCelVis: Easy Single-Cell Visualization
Installation
The only prerequisite is Python 3, everything else will be installed together with the scelvis package.
You can install SCelVis and its dependencies using pip or through conda:
$ pip install scelvis
# OR
$ conda install scelvis
A Docker container is also available via Quay.io/Biocontainers.
$ docker run quay.io/biocontainers/scelvis:TAG scelvis --help
$ docker run -p 8050:8050 -v data:/data quay.io/biocontainers/scelvis:TAG scelvis run --data-source /data
look up the latest TAG to use at here, e.g.,
$ docker run quay.io/biocontainers/scelvis:0.7.0--py_0 scelvis --help
$ docker run -p 8050:8050 -v data:/data quay.io/biocontainers/scelvis:0.7.0--py_0 scelvis run --data-source /data
Tutorial
explore 1000 cells from a 1:1 Mixture of Fresh Frozen Human (HEK293T) and Mouse (NIH3T3) Cells (10X v3 chemistry) or a published dataset of ~14000 IFN-beta treated and control PBMCs from 8 donors (GSE96583; see Kang et al.)
$ scelvis run --data-source /path/to/scelvis/examples/hgmm_1k.h5ad
$ scelvis run --data-source https://files.figshare.com/18037739/pbmc.h5ad
and then point your browser to http://0.0.0.0:8050/ or http://localhost:8050/
Preparing Your Data
Data sets are provided as HDF5 files (anndata objects) that store gene expression (sparse CSR matrix) and meta data with very fast read access.
For the input you can either specify one HDF5 file or a directory containing multiple such files.
You can use scanpy to create this HDF5 file directly or use the scelvis convert command for converting your single-cell pipeline output.
HDF5 Input
for HDF5 input, you can do your analysis with scanpy to create an anndata object ad. SCelVis will use embedding coordinates from ad.obsm, cell annotation from ad.obs and expression data directly from ad.X (this should contain normalized and log-transformed expression values for all genes). If present, information about the dataset will be extracted from strings stored in ad.uns['about_title'], ad.uns['about_short_title'] and ad.uns['about_readme'] (assumed to be Markdown). Information about marker genes will be taken either from the rank_genes_groups slot in ad.uns or from entries starting with marker_ in ad.uns: entries called marker_gene (required!), marker_cluster, marker_padj, marker_LFC will create a table with the columns gene, cluster, padj, and LFC.
If you prepared your data with Seurat (v2), you can use Convert(from = sobj, to = "anndata", filename = "data.h5ad") to get an HDF5 file.
Text Input
For “raw” text input, you need to prepare at least three files in the input directory:
expression.tsv.gz, a tab-separated file with normalized expression values for each gene (rows) in each cell (columns), e.g., like this:
. cell_1 cell_2 cell_3 ... gene_1 0.13 0.0 1.5 ... gene_2 0.0 3.1 0.3 ... gene_3 0.0 0.0 0.0 ...
annotation.tsv, a tab-separated file with annotations for each cell, e.g., like this:
. cluster genotype ... cell_1 cluster_1 WT ... cell_2 cluster_2 KO ...
coords.tsv, a tab-separated file with embedding coordinates for each cell, e.g., like this:
. tSNE_1 tSNE_2 UMAP_1 UMAP_2 ... cell_1 20.53 -10.05 3.9 2.4 ... cell_2 -5.34 13.94 -1.3 3.4 ...
markers.tsv, an optional tab-separated file with marker genes and it needs to have a column named ``gene``, e.g., like this:
gene cluster log2FC adj_pval ... gene_1 cluster_1 3.4 1.5e-6 ... gene_2 cluster_1 1.3 0.00004 ... gene_3 cluster_2 2.1 5.3e-9 ...
a markdown file (e.g., text_input.md) with information about this dataset:
---- title: An Optional Long Data Set Title short_title: optional short title ---- A verbose description of the data in Markdown format.
$ scelvis convert --input-dir text_input --output data/text_input.h5ad --about-md text_input.md
in examples/dummy_raw.zip and examples/dummy_about.md we provide raw data for a simulated dummy dataset.
Loom Input
for loompy or loomR input, you can convert your data like this:
$ scelvis convert --i input.loom -m markers.tsv -a about.md -o loom_input.h5ad
if you prepared your data with Seurat (v3), you can use as.loom(sobj, filename="output.loom") to get a .loom file and then convert to .h5ad with the above command.
CellRanger Input
Alternatively, the output directory of CellRanger can be used. This is the directory called outs containing either a file called filtered_gene_bc_matrices_h5.h5 (version 2) or a file called filtered_feature_bc_matrix.h5 (version 3), and a folder analysis with clustering, embedding and differential expression results. This will not no any further processing except log-normalization. Additionally, a markdown file provides meta information about the dataset (see above)
$ mkdir -p data
$ cat <<EOF > data/cellranger.md
----
title: My Project
short_title: my_project
----
This is my project data.
EOF
$ scelvis convert --input-dir cellranger-out --output data/cellranger_input.h5ad --about-md cellranger.md
In examples/hgmm_1k_raw we provide CellRanger output for the 1k 1:1 human mouse mix. Specifically, from the outs folder we selected
filtered_feature_bc_matrix.h5
tSNE and PCA projections from analysis/tsne and analysis/pca
clustering from analysis/clustering/graphclust and
markers from analysis/diffexp/graphclust
examples/hgmm_1k_about.md contains information about this dataset
Visualizing Your Data
$ tree data
data
├── text_input.h5ad
└── cellranger_input.h5ad
$ scelvis run --data-source data/cellranger_input.h5ad
# OR
$ scelvis run --data-source data
Data Sources
Data sources can be:
paths, e.g., relative/paths or /absolute/paths or file://url/paths
SFTP URLs, e.g., sftp://user:password@host/path/to/data
FTP URLs, e.g., ftp://user:password@host/path/to/data (sadly encryption is not supported by the underlying library PyFilesystem2.
- iRODS URLS, e.g., irods://user:password@host/zoneName/path/to/data
Enable SSL via irods+ssl
Switch to PAM authentication with irods+pam (you can combine this with +ssl in any order)
Enable ticket access by appending ?ticket=TICKET.
HTTP(S) URLs, e.g., https://user:password@host/path/to/data.
S3 URLs, e.g., s3://bucket/path, optionally s3://key:token@bucket/path.
Data sources can either point to HDF5 files directly or to directories containing multiple HDF5 files. The only exception is iRODS with ticket-based access. Because of technical restrictions, you have to assign a unique ticket for each data set and specify the data sets individually.
Environment Variables
You can use the following environment variables to configure the server.
SCELVIS_DATA_SOURCES – semicolon-separated list of data sources
SCELVIS_HOST – host specification for web server to listen on
SCELVIS_PORT – port for web server to listen on
SCELVIS_CACHE_DIR – directory to use for the cache (default is to create a temporary directory)
SCELVIS_CACHE_REDIS_URL – enable caching with REDIS and provide connection URL
SCELVIS_CACHE_DEFAULT_TIMEOUT – cache lifetime coverage
SCELVIS_CACHE_PRELOAD_DATA – will preload all data at startup
SCELVIS_UPLOAD_DIR – the directory to store uploaded data sets in (default is to create a temporary directory)
SCELVIS_UPLOAD_DISABLED – set to “0” to disable upload feature
SCELVIS_CONVERSION_DISABLED – set to “0” to disable the conversion feature
SCELVIS_URL_PREFIX – set if you want to run scelvis below a non-root path (e.g., behind a reverse proxy)
Developer Setup
The prerequisites are:
- Python 3, either
system-wide installation with virtualenv, or
installed with Conda.
Git LFS must be installed for obtaining the example data files.
For virtualenv, first create a virtual environment and activate it.
$ virtualenv -p venv
$ source venv/bin/activate
For a Conda-based setup create a new environment and activate it.
$ conda create -y -n scelvis 'python>=3.6'
$ conda activate scelvis
Next, clone the repository and install the software as editable (-e). Also install the development requirements to get helpers such as black. (Note that you must have Git LFS installed to actually obtain the data files).
$ git clone git@github.com:bihealth/scelvis.git
$ cd scelvis
$ pip install -e .
$ pip install -r requirements/develop.txt
Afterwards, you can run the visualization web server as follows:
$ scelvis run --data-source path/to/data/dir
To explore the datasets provided in the git repository, use git lfs fetch to download
Releasing Packages
For the PyPi package:
$ python setup.py sdist
$ twine upload --repository-url https://test.pypi.org/legacy/ dist/scelvis-*.tar.gz
$ twine upload dist/scelvis-*.tar.gz
For the Bioconda package, see the great documentation. The Docker image will automatically be created as a BioContainer when the Bioconda package is built.
History
v0.7.3
fix cache timeout error
v0.7.2
fix ReferenceError
updated README and tutorialmovie
v0.7.1
improved cache handling
improved user feedback for filtering & differential expression
v0.7.0
added conversion from .loom files
cell filtering also supports downsampling
added PBMC dataset hosted on figshare
added demo movie
v0.6.0
cell filtering
differential expression
v0.5.0
upgrades to Dash v1
fixes to UI, upload and conversion
avoid creation of dense matrices
v0.4.1
Fixing bug with specifying single .h5ad file as data source.
Adding Dockerfile for building Docker images from intermediate versions.
v0.4.0
Adding support for HTTP(S) data sources.
Embedding about.md information in Anndata file.
Adding support for passing
v0.3.0
Adding example data set.
Adding nice introduction to start page.
Adding functionality for creating simple fake data set.
Making import of ruamel_yaml more robust still.
Adding tests.
Adding Travis CI–based continuous integration tests.
v0.2.1
Fixing SFTP support.
Fixing import of ruamel_yaml.
v0.2.0
More refactorization.
Fixing dependency on ruamel-yaml to ruamel.yaml.
Adding conversion feature.
Adding upload feature.
Adding support to load from SSHFS, FTP through pyfilesystem (no FTPS support).
Adding support to load from iRODS, also works via tickets (pass ?ticket=TICKET to the query parameters).
v0.1.0
Initial release.
Everything is new!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.