Project description

pySpade: Single cell Perturbations - Analysis of Differential gene Expression

Overview

pySpade is a user friendly tool to perform the whole transcriptome analysis of single cell perturbation dataset. With the direct output of Cellranger, pySpade utilizes hypergeomtric test to analyze the whole transcriptome differential expression and generates hits table csv file. User can use the table to do downstream processing like generating Manhattan plots (tutorial includes). Currently we support human genome.

Requirement

Python (3.7 +)
Numpy (1.21 +)
Pandas (1.3.5 +)
Scipy (1.6.2 +)

Installation

pySpade can be installed with pip

pip install pySpade

Usage

$pySpade
usage: pySpade [-h]  ...

pySpade 
Version: 0.0.4.2

optional arguments:
  -h, --help  show this help message and exit

functions:
  
    process   process mapping output and reformat for downstream analysis.
    explevel  check the average expression level of query genes in single cell matrix
    fc        check the fold change of sgrna
    DEobs     perform differential expression analysis of observed cells
    DErand    perform differential expression analysis of random selection background
    local     perform local hit analysis with observation data and random background
    global    perform global hit analysis with observation data and random background

process : Process transcriptome output and sgrna output to remove experimental doublets and sgrna outlier cells.
- Input 1: Transcriptome matrix is from Cellranger output (outs folder).
- Input 2: sgrna matrix column: cell barcodes consistent with transcriptome matrix, rows: sgrna sequence. The sgrna matrix is already filtered out potential noise sgrna. Acceptable format: pkl and csv.
- The final output format is h5 file. The final output can be compressed to save disk space, but it may take more time to write the final output file.
explevel : Check the average expression level of query genes in single cell matrix.
- Input 1: processed transcriptome matrix from the process output.
- Input 2: Query genes list has to be txt file, genes are seperated with new line.
fc : Check the fold change of perturbed region and individual sgRNA for query region and gene. Good for test if positive controls work. P-value are calculated with Student's t-test.
- Input 1: processed transcriptome and sgrna matrix from the process output
- Inout 2: sgrna dict file (perturbation region hg38 coordinates and the sgrna name targeting that region. Region and sgrnas separated by tab, and sgrnas separated by comma. The sgrna name must match the index of sgrna matrix.)
  - Example:
- chr1:1234567-1235067 sg1;sg2;sg3;sg4;sg5
- chr2:1234567-1235067 sg6;sg7;sg8;sg9;sg10
- Input 3: Query file, the query region and query test, separate by tab.
  - Example:
- chr1:1234567-1235067 GENE1
- chr2:1234567-1235067 GENE2
DEobs : Perfrom the genome wide differential expression analysis of all the perturbation regions.
- Input 1: processed transcriptome and sgrna matrix from the process output
- Input 2: sgrna dict file (perturbation region hg38 coordinates and the sgrna sequence targeting that region. Region and sgrnas separated by tab, and sgrnas separated by comma. The sgrna name must match the index of sgrna matrix).
- Output files: up regulation p-value, downregulation p-value, fold change(compare with all the other cells) and average cpm.
DErand : Perfrom the genome wide differential expression analysis of 1000 random selection cells.
- There are two options for random selection: all cells with equal probability or probability based on sgrna number in the cells. User should specify the cell number to select randomly. It is recommended with either exact cell number or bins (with large amount of perturbation experiment in order to reduce computational overhead).
local : Using the observation p value and randomization bavckground p value to calculate the adjusted p value based on Gaussian distribution approximation. Local hits calculation includes the genes within plus and minus 2 Mb of the perturbation region. The output is a csv file with all hits information.
global : Using the observation p value and randomization background p value to calculate the adjusted p value based on Gaussian distribution approximation. The output is a csv file with all hits information.

Contacts

Yihan Wang Yihan.Wang@UTSouthwestern.edu
Gary Hon Gary.Hon@UTSouthwestern.edu

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.1.2

Apr 9, 2024

0.1.1

Apr 4, 2024

0.1.0

Mar 25, 2024

0.0.4.3

Jan 9, 2024

This version

0.0.4.2

Dec 31, 2023

0.0.4.1

Dec 31, 2023

0.0.4

Dec 28, 2023

0.0.2

Jun 30, 2023

0.0.1

Dec 9, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pySpade-0.0.4.2.tar.gz (693.3 kB view hashes)

Uploaded Dec 31, 2023 Source

Hashes for pySpade-0.0.4.2.tar.gz

Hashes for pySpade-0.0.4.2.tar.gz
Algorithm	Hash digest
SHA256	`cbd2081739f8c812fff017ec9f4aa8087b7752c8b2f914da7c6696b9680910c5`
MD5	`ffc1fc98c4570da8edb7c2937269cca0`
BLAKE2b-256	`d4bba6b95ee0738578a2f1b4dcb6fdccc727447c2a2f8b72a3494703406201b9`