Skip to main content

A tool for single cell classification and characterization.

Project description

Pollock

Image of Pollock

Pollock is a tool for single cell classification. Pollock is available in both Python, R, and as a command line tool

In Development

Installation

Requirements

  • OS:

    • macOS 10.12.6 (Sierra) or later
    • Ubuntu 16.04 or later
    • Windows 7 or later (not tested)
  • Python3.6 or later

  • Working installation of conda and bioconda. If you are new to conda and bioconda, we recommend following the getting started page here

To install

pollock is available through the conda package manager.

In addition to the default conda channels, pollock requires bioconda. In particular to ensure proper installation you must have your conda channels set up in the correct order by running the following:

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

To install

conda install -c epstorrs pollock==0.0.8

Usage

Pollock uses deep learning to make cell type predictions. At it's core, pollock is build upon a deep learning technique called a Beta Variational Autoencoder (BVAE).

With pollock, there are a selection of cell type classification modules that have been trained on a variety of single cell RNA-seq datasets. Any of these modules can be used to classify your single cell data.

Additionally, if you have annotated single cell data, pollock can also be used to train a new module based on the given cell types.

Modules

There are a variety of modules available for cell type classification. They can be found on the dinglab cluster at /diskmnt/Projects/Users/estorrs/pollock/modules.

To list available modules run

ls /diskmnt/Projects/Users/estorrs/pollock/modules

You can also create new modules with pollock (see below)

Python

module training tutorial on pbmc dataset

prediction with an existing module

module examination

R

There is an R library rpollock that comes installed with pollock that allows you to train a module and make predictions directly from R.

Note: rpollock is dependent on the R library reticulate, which will sometimes prompt for a python install location. If this occurs, run the below code to find out the location of your python installation. It will output <path/to/python/executable>

which python3

When running R you will need to have this line at the very start of your script (before your library imports)

reticulate::use_python("<path/to/python/executable>")

example usage of rpollock on pbmc3k

Command line tool

usage: pollock [-h] [--seurat-rds-filepath SEURAT_RDS_FILEPATH]
               [--scanpy-h5ad-filepath SCANPY_H5AD_FILEPATH]
               [--counts-10x-filepath COUNTS_10X_FILEPATH]
               [--min-genes-per-cell MIN_GENES_PER_CELL]
               [--output-type OUTPUT_TYPE] [--output-prefix OUTPUT_PREFIX]
               source_type module_filepath
Arguments

source_type

  • Input source type. Possible values are: from_seurat, from_10x, from_scanpy.

module_filepath

  • Filepath to module to use for classification. The location of the tumor/tissue module to use for classification. For beta, available modules are stored in katmai at /diskmnt/Projects/Users/estorrs/pollock/modules.
optional arguments

--seurat-rds-filepath SEURAT_RDS_FILEPATH

  • A saved Seurat RDS object to use for classification. Seurat experiment matrix must be raw expression counts (i.e. not normalized)

--scanpy-h5ad-filepath SCANPY_H5AD_FILEPATH

  • A saved .h5ad file to use for classification. scanpy data matrix (.X attribute in the anndata object) must be raw expression counts (i.e. not normalized)

--counts-10x-filepath COUNTS_10X_FILEPATH

  • Results of 10X cellranger run to be used for classification. There are two options for inputs: 1) the mtx count directory (typically at outs/raw_feature_bc_matrix), and 2) the .h5 file (typically at outs/raw_feature_bc_matrix.h5).

--min-genes-per-cell MIN_GENES_PER_CELL

  • The minimun number of genes expressed in a cell in order for it to be classified. Only used in 10x mode

--output-type OUTPUT_TYPE

  • What output type to write. Valid arguments are scanpy and txt

--output-prefix OUTPUT_PREFIX

  • Filepath prefix to write output file.
example basic usage
from 10x output

An example of running the single-cell cesc module with 10x .mtx.gz output folder

pollock from_10x /diskmnt/Projects/Users/estorrs/pollock/modules/sc_cesc --counts-10x-filepath </filepath/to/cellranger/outs/raw_feature_bc_matrix> --output-prefix output --output-type txt

An example of running the single-cell cesc module with 10x .h5 output

pollock from_10x /diskmnt/Projects/Users/estorrs/pollock/modules/sc_cesc --counts-10x-filepath </filepath/to/cellranger/outs/raw_feature_bc_matrix.h5> --output-prefix output --output-type txt
from seurat rds object

An example of running the single-cell myeloma module with an rds object

pollock from_seurat /diskmnt/Projects/Users/estorrs/pollock/modules/sc_myeloma --seurat-rds-filepath </filepath/to/seurat/rds> --output-prefix output --output-type txt
from scanpy h5ad file

An example of running the single-cell myeloma module with an scanpy .h5ad file

pollock from_scanpy /diskmnt/Projects/Users/estorrs/pollock/modules/sc_myeloma --scanpy-h5ad-filepath </filepath/to/scanpy/h5ad> --output-prefix output --output-type txt

example basic usage within a docker container

Docker images are available at dockerhub under the image name estorrs/pollock-cpu. To pull the latest image run the following:

docker pull estorrs/pollock-cpu:latest

When using docker, input and ouput file directories need to be mounted as a volume using the docker -v argument.

An example of running the single-cell cesc module from within a docker container. Sections outlined by <> need to be replaced. Note filepaths in the -v flag must be absolute.

ding lab only: the </path/to/modules/directory/> would be /diskmnt/Projects/Users/estorrs/pollock/modules on katmai

docker run -v </path/to/directory/with/seurat/rds>:/inputs -v </path/to/output/directory>:/outputs -v </path/to/modules/directory/>:/modules -t estorrs/pollock-cpu pollock from_seurat /modules/sc_myeloma --seurat-rds-filepath /inputs/<filename.rds> --output-prefix /outputs/output --output-type txt

Outputs

There are two possible output types:

  • txt : tab seperated text file
  • scanpy: a .h5ad file that can be loaded with scanpy

The following fields will be included in the output: predicted cell type, predicted cell type probability, and probabilities for each potential cell type in the module

docker

Dockerfiles for running pollock can be found in the docker/ directory. They can also be pulled from estorrs/pollock-cpu on dockerhub. To pull the latest pollock docker image run the following:

docker pull estorrs/pollock-cpu

To see usage with a docker container see the Usage - command line tool - docker section

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dinglab-pollock-0.0.9.tar.gz (24.1 MB view hashes)

Uploaded Source

Built Distribution

dinglab_pollock-0.0.9-py3-none-any.whl (19.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page