Skip to main content

running single cell analysis on Nvidia GPUs

Project description

Stars PyPI PyPIDownloads Documentation Status

rapids-singlecell

Background

This repository offers some tools to make analyses of single cell datasets faster by running them on the GPU. The functions are analogous versions of functions that can be found within scanpy from the Theis lab or functions from rapids-single-cell-examples created by the Nvidia RAPIDS team. Most functions are kept close to the original code to ensure compatibility. My aim with this repository was to use the speedup that GPU computing offers and combine it with the ease of use from scanpy.

Installation

Conda

The easiest way to install rapids-singlecell is to use one of the yaml file provided in the conda folder. These yaml files install everything needed to run the example notbooks and get you started.

conda env create -f conda/rsc_rapids_22.12.yml
# or
mamba env create -f conda/rsc_rapids_23.02.yml

PyPI

As of version 0.4.0 rapids-singlecell is now on PyPI.

pip install rapids-singlecell

The default installer doesn't cover RAPIDS nor cupy. Information on how to install RAPIDS & cupy can be found here.

If you want to use RAPIDS new PyPI packages, the whole library with all dependencies can be install with:

pip install 'rapids-singlecell[rapids]' --extra-index-url=https://pypi.nvidia.com

Please note that the RAPIDS PyPI packages are still considered experimental. It is important to ensure that the CUDA environment is set up correctly so that RAPIDS and Cupy can locate the necessary libraries.

To view a full guide how to set up a fully functioned single cell GPU accelerated conda environment visit GPU_SingleCell_Setup

Documentation

Please have a look through the documentation

Citation

If you use this code, please cite: DOI

Please also consider citing: rapids-single-cell-examples and scanpy

In addition to that please cite the methods' original research articles in the scanpy documentation

If you use the accelerated decoupler functions please cite decoupler

Notebooks

To show the capability of these functions, I created two example notebooks evaluating the same workflow running on the CPU and GPU. These notebooks should run in the environment, that is described in Requirements. First, run the data_downloader notebook to create the AnnData object for the analysis. If you run both demo_cpu and demo_gpu you should see a big speedup when running the analyses on the GPU.

Benchmarks

Here are some benchmarks. I ran the notebook on the CPU with as many cores as were available where possible.

Step CPU (Ryzen 5950x, 32 Cores, 64GB RAM) GPU (RTX 3090) CPU (AMD Eypc Rome, 30 Cores, 500GB RAM) GPU (Quadro RTX 6000) GPU (A100 80GB)
whole Notebook 728 s 43 s 917 s 67 s 57 s
Preprocessing 75 s 21 s 40 s 34 s 30 s
Clustering and Visulatization 423 s 18 s 524 s 27 s 21 s
Normalize_total 252 ms > 1ms 425 ms 1 ms 1 ms
Highly Variable Genes 3.2 s 2.6 s 4.1 s 2.7 s 3.7 s
Regress_out 63 s 2 s 24 s 2 s 2 s
Scale 1.3 s 299 ms 2 s 2 s 359 ms
PCA 26 s 1.8 s 23 s 3.6 s 2.6 s
Neighbors 10 s 5 s 16.8 s 8.1 s 6 s
UMAP 30 s 659 ms 66 s 1 s 783 ms
Louvain 16 s 121 ms 20 s 214 ms 201 ms
Leiden 11 s 102 ms 20 s 175 ms 152 ms
TSNE 240 s 1.4 s 319 s 1.8 s 1.4 s
Logistic_Regression 74 s 4 s 45 s 5 s 3.4 s
Diffusion Map 715 ms 259 ms 747 ms 431 ms 826 ms
Force Atlas 2 207 s 236 ms 300 s 298 ms 353 ms

I also observed that the first GPU run in a new enviroment is slower than the runs after that (with a restarted kernel) (RTX 6000).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rapids_singlecell-0.6.1.tar.gz (46.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rapids_singlecell-0.6.1-py3-none-any.whl (59.1 kB view details)

Uploaded Python 3

File details

Details for the file rapids_singlecell-0.6.1.tar.gz.

File metadata

  • Download URL: rapids_singlecell-0.6.1.tar.gz
  • Upload date:
  • Size: 46.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for rapids_singlecell-0.6.1.tar.gz
Algorithm Hash digest
SHA256 2e31bdd33459e4e78f8b97bf0dd141d021e3a34ff3c02530d1f41bd3ec0d972b
MD5 8feedb1c38e414d7c8b7fed7818fbfb7
BLAKE2b-256 5b6cdf619423088db7654507619e28c2d5d162ee7b9e1a707c2677e3e927f1bc

See more details on using hashes here.

File details

Details for the file rapids_singlecell-0.6.1-py3-none-any.whl.

File metadata

File hashes

Hashes for rapids_singlecell-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4be51af899b0ecbeb167fe8920181ad94a6dee1a8f914e4954786bbd0a61a1fa
MD5 fe1e4e30ea4cfd926a6bbf953417f49f
BLAKE2b-256 004fcd9747a3734b10a526238593c4d1d0a730b926ac99012f1a91d8853f38db

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page