Single-cell RNA-seq data visualization
Project description
SCelVis: Easy Single-Cell Visualization
You can find the URL for the demo linked to on the top right of the Github repository page.
Installation
The only prerequisite is Python 3, everything else will be installed together with the scelvis package.
You can install SCelVis and its dependencies using pip or through conda:
$ pip install scelvis
# OR
$ conda install scelvis
A Docker container is also available via Quay.io/Biocontainers.
$ docker run quay.io/biocontainers/scelvis:TAG scelvis --help
$ docker run -p 8050:8050 -v data:/data quay.io/biocontainers/scelvis:TAG scelvis run --data-source /data
Lookup the latest TAG to use at here.
Tutorial
explore a simulated dummy dataset or 1000 cells from a 1:1 Mixture of Fresh Frozen Human (HEK293T) and Mouse (NIH3T3) Cells (10X v3 chemistry)
$ scelvis run --data-source /path/to/scelvis/examples/dummy.h5ad
$ scelvis run --data-source /path/to/scelvis/examples/hgmm_1k.h5ad
and then point your browser to http://0.0.0.0:8050/.
Preparing Your Data
Data sets are provided as HDF5 files (anndata objects) that store gene expression (sparse CSR matrix) and meta data with very fast read access.
For the input you can either specify one HDF5 file or a directory containing multiple such files.
You can use scanpy to create this HDF5 file directly or use the scelvis convert command for converting your single-cell pipeline output.
HDF5 Input
for HDF5 input, you can do your analysis with scanpy to create an anndata object ad. SCelVis will use embedding coordinates from ad.obsm, cell annotation from ad.obs and expression data directly from ad.X (this should contain normalized and log-transformed expression values for all genes). Information about the dataset will be extracted from strings stored in ad.uns['about_title'], ad.uns['about_short_title'] and ad.uns['about_readme'] (assumed to be Markdown). Information about marker genes will be taken from entries starting with marker_ in ad.uns: entries called marker_gene (required!), marker_cluster, marker_padj, marker_LFC will create a table with the columns gene, cluster, padj, and LFC.
Text Input
For “raw” text input, you need to prepare at least three files in the input directory:
expression.tsv.gz, a tab-separated file with normalized expression values for each gene (rows) in each cell (columns), e.g., like this:
. cell_1 cell_2 cell_3 ... gene_1 0.13 0.0 1.5 ... gene_2 0.0 3.1 0.3 ... gene_3 0.0 0.0 0.0 ...
annotation.tsv, a tab-separated file with annotations for each cell, e.g., like this:
. cluster genotype ... cell_1 cluster_1 WT ... cell_2 cluster_2 KO ...
coords.tsv, a tab-separated file with embedding coordinates for each cell, e.g., like this:
. tSNE_1 tSNE_2 UMAP_1 UMAP_2 ... cell_1 20.53 -10.05 3.9 2.4 ... cell_2 -5.34 13.94 -1.3 3.4 ...
markers.tsv, an optional tab-separated file with marker genes and it needs to have a column named ``gene``, e.g., like this:
gene cluster log2FC adj_pval ... gene_1 cluster_1 3.4 1.5e-6 ... gene_2 cluster_1 1.3 0.00004 ... gene_3 cluster_2 2.1 5.3e-9 ...
a markdown file (e.g., text_input.md) with information about this dataset:
---- title: An Optional Long Data Set Title short_title: optional short title ---- A verbose description of the data in Markdown format.
$ scelvis convert --input-dir text_input --output data/text_input.h5ad --about-md text_input.md
in examples/dummy_raw.zip and examples/dummy_about.md we provide raw data for the dummy dataset.
CellRanger Input
Alternatively, the output directory of CellRanger can be used. This is the directory called outs containing either a file called filtered_gene_bc_matrices_h5.h5 (version 2) or a file called filtered_feature_bc_matrix.h5 (version 3), and a folder analysis with clustering, embedding and differential expression results. This will not no any further processing except log-normalization. Additionally, a markdown file provides meta information about the dataset (see above)
$ mkdir -p data
$ cat <<EOF > data/cellranger.md
----
title: My Project
short_title: my_project
----
This is my project data.
EOF
$ scelvis convert --input-dir cellranger-out --output data/cellranger_input.h5ad --about-md cellranger.md
In examples/hgmm_1k_raw.zip we provide CellRanger output for the 1k 1:1 human mouse mix. Specifically, from the outs folder we selected
filtered_feature_bc_matrix.h5
tSNE and PCA projections from analysis/tsne and analysis/pca
clustering from analysis/clustering/graphclust and
markers from analysis/diffexp/graphclust
examples/hgmm_1k_about.md contains information about this dataset
Visualizing Your Data
$ tree data
data
├── text_input.h5ad
└── cellranger_input.h5ad
$ scelvis run --data-source data/cellranger_input.h5ad
# OR
$ scelvis run --data-source data
Data Sources
Data sources can be:
paths, e.g., relative/paths or /absolute/paths or file://url/paths
SFTP URLs, e.g., sftp://user:password@host/path/to/data
FTP URLs, e.g., ftp://user:password@host/path/to/data (sadly encryption is not supported by the underlying library PyFilesystem2.
- iRODS URLS, e.g., irods://user:password@host/zoneName/path/to/data
Enable SSL via irods+ssl
Switch to PAM authentication with irods+pam (you can combine this with +ssl in any order)
Enable ticket access by appending ?ticket=TICKET.
HTTP(S) URLs, e.g., https://user:password@host/path/to/data.
S3 URLs, e.g., s3://bucket/path, optionally s3://key:token@bucket/path.
Data sources can either point to HDF5 files directly or to directories containing multiple HDF5 files. The only exception is iRODS with ticket-based access. Because of technical restrictions, you have to assign a unique ticket for each data set and specify the data sets individually.
Environment Variables
You can use the following environment variables to configure the server.
SCELVIS_DATA_SOURCES – semicolon-separated list of data sources
SCELVIS_HOST – host specification for web server to listen on
SCELVIS_PORT – port for web server to listen on
SCELVIS_CACHE_DIR – directory to use for the cache (default is to create a temporary directory)
SCELVIS_CACHE_REDIS_URL – enable caching with REDIS and provide connection URL
SCELVIS_CACHE_DEFAULT_TIMEOUT – cache lifetime coverage
SCELVIS_UPLOAD_DIR – the directory to store uploaded data sets in (default is to create a temporary directory)
SCELVIS_UPLOAD_DISABLED – set to “0” to disable upload feature
SCELVIS_CONVERSION_DISABLED – set to “0” to disable the conversion feature
SCELVIS_URL_PREFIX – set if you want to run scelvis below a non-root path (e.g., behind a reverse proxy)
Developer Setup
The prerequisites are:
- Python 3, either
system-wide installation with virtualenv, or
installed with Conda.
For virtualenv, first create a virtual environment and activate it.
$ virtualenv -p venv
$ source venv/bin/activate
For a Conda-based setup create a new environment and activate it.
$ conda create -y -n scelvis 'python>=3.6'
$ conda activate scelvis
Next, clone the repository and install the software as editable (-e). Also install the development requirements to get helpers such as black.
$ git clone git@github.com:bihealth/scelvis.git
$ cd scelvis
$ pip install -e .
$ pip install -r requirements/develop.txt
Afterwards, you can run the visualization web server as follows:
$ scelvis run --data-source path/to/data/dir
Releasing Packages
For the PyPi package:
$ python setup.py sdist
$ twine upload --repository-url https://test.pypi.org/legacy/ dist/scelvis-*.tar.gz
$ twine upload dist/scelvis-*.tar.gz
For the Bioconda package, see the great documentation. The Docker image will automatically be created as a BioContainer when the Bioconda package is built.
History
v0.5.0
upgrades to Dash v1
fixes to UI, upload and conversion
avoid creation of dense matrices
v0.4.1
Fixing bug with specifying single .h5ad file as data source.
Adding Dockerfile for building Docker images from intermediate versions.
v0.4.0
Adding support for HTTP(S) data sources.
Embedding about.md information in Anndata file.
Adding support for passing
v0.3.0
Adding example data set.
Adding nice introduction to start page.
Adding functionality for creating simple fake data set.
Making import of ruamel_yaml more robust still.
Adding tests.
Adding Travis CI–based continuous integration tests.
v0.2.1
Fixing SFTP support.
Fixing import of ruamel_yaml.
v0.2.0
More refactorization.
Fixing dependency on ruamel-yaml to ruamel.yaml.
Adding conversion feature.
Adding upload feature.
Adding support to load from SSHFS, FTP through pyfilesystem (no FTPS support).
Adding support to load from iRODS, also works via tickets (pass ?ticket=TICKET to the query parameters).
v0.1.0
Initial release.
Everything is new!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.