Skip to main content

Identify gene pairs that are codependent and mutually exclusive from single-cell RNA-seq data.

Project description

EEISP

EEISP identifies gene pairs that are codependent and mutually exclusive from single-cell RNA-seq data.

0. Changelog

See Changelog

1. Installation

pip3 install -U eeisp

2. Usage

EEISP takes a read count matrix as an input, in which rows and columns represent genes and cells, respectively. A gzipped file (.gz) is also acceptable.

  1. (Optional) Convert CellRanger output to an input matrix (require R and Seurat library)

      datadir="outs/filtered_feature_bc_matrix/"
      matrix="matrix.txt"
      R -e "library(Seurat); so <- Read10X('$datadir'); write.table(so, '$matrix', quote=F, sep=',', col.names=T)"
    
  2. eeisp calculates the CDI and EEI scores for all gene pairs. The output contains lists of gene pairs that have CDI or EEI values above the specified threshold and the tables of degree distribution.

      usage: eeisp [-h] [--threCDI THRECDI] [--threEEI THREEEI] [--tsv] [--gpu] [-p THREADS] [-v] matrix output
    
      positional arguments:
        matrix                Input matrix
        output                Output prefix
    
      optional arguments:
        -h, --help            show this help message and exit
        --threCDI THRECDI     Threshold for CDI (default: 20.0)
        --threEEI THREEEI     Threshold for EEI (default: 10.0)
        --tsv                 Specify when the input file is tab-delimited (.tsv)
        --gpu                 GPU mode
        -p THREADS, --threads THREADS  number of threads (default: 2)
        -v, --version         show program's version number and exit
    
  3. eeisp_add_genename_from_geneid add Gene Names (Symbols) to the output files of eeisp.

     usage: eeisp_add_genename_from_geneid [-h] [--i_id I_ID] [--i_name I_NAME] input output genelist
    
     positional arguments:
       input            Input matrix
       output           Output prefix
       genelist         Gene list
    
     optional arguments:
       -h, --help       show this help message and exit
       --i_id I_ID      column number of gene id (default: 0)
       --i_name I_NAME  column number of gene name (default: 1)
    

3. Tutorial

The sample data is included in sample directory.

  • data.txt: the input matrix of scRNA-seq data.
  • genelidlist.txt: the gene list for eeisp_add_genename_from_geneid.
eeisp data.txt Sample --threCDI 0.5 --threEEI 0.5 -p 8

This command outputs gene pair lists that have CDI>0.5 or EEI>0.5. -p 8 means 8 CPUs are used.

(Note: Since GPU computation covers a part of eeisp, it is better to use multiple CPUs even in --gpu mode for the fast computation.)

Output files are:

   Sample_CDI_score_data_thre0.5.txt            # A list of gene pairs with CDI score.
   Sample_CDI_degree_distribution_thre0.5.csv   # A table of the number of CDI degree and genes.
   Sample_EEI_score_data_thre0.5.txt            # A list of gene pairs with EEI scores.
   Sample_EEI_degree_distribution_thre0.5.csv   # A table of the number of EEI degree and genes.

The output files might include gene ids only.

   $ head Sample_CDI_score_data_thre0.5.txt
   2       7       ESG000003       ESG000008       0.96384320244841
   0       1       ESG000001       ESG000002       0.6852891560232545
   0       6       ESG000001       ESG000007       0.6852891560232545
   7       8       ESG000008       ESG000009       0.6852891560232545
   3       9       ESG000004       ESG000010       0.6469554204484568
   4       6       ESG100005       ESG000007       0.5258703930217091

If you want to add gene names (Symbols), use eeisp_add_genename_from_geneid with geneidlist.txt, which contains the pairs of gene ids and names.

 eeisp_add_genename_from_geneid \
     Sample_CDI_score_data_thre0.5.txt \
     Sample_CDI_score_data_thre0.5.addgenename.txt \
     geneidlist.txt
 eeisp_add_genename_from_geneid \
     Sample_EEI_score_data_thre0.5.txt \
     Sample_EEI_score_data_thre0.5.addgenename.txt \
     geneidlist.txt

The output files include gene names.

   $ head Sample_CDI_score_data_thre0.5.addgenename.txt
   2       7       ESG000003       ESG000008       OR4F5   FO538757.3      0.96384320244841
   0       1       ESG000001       ESG000002       RP11-34P13.3    FAM138A 0.6852891560232545
   0       6       ESG000001       ESG000007       RP11-34P13.3    RP11-34P13.9    0.6852891560232545
   7       8       ESG000008       ESG000009       FO538757.3      FO538757.2      0.6852891560232545
   3       9       ESG000004       ESG000010       RP11-34P13.7    AP006222.2      0.6469554204484568
   4       6       ESG100005       ESG000007       RP11-34P13.8    RP11-34P13.9    0.5258703930217091

4. Reference

Nakajima N., Hayashi T., Fujiki K., Shirahige K., Akiyama T., Akutsu T. and Nakato R., Codependency and mutual exclusivity for gene community detection from sparse single-cell transcriptome data, Nucleic Acids Research, 2021.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eeisp-0.6.2.tar.gz (28.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eeisp-0.6.2-py3-none-any.whl (29.1 kB view details)

Uploaded Python 3

File details

Details for the file eeisp-0.6.2.tar.gz.

File metadata

  • Download URL: eeisp-0.6.2.tar.gz
  • Upload date:
  • Size: 28.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.13

File hashes

Hashes for eeisp-0.6.2.tar.gz
Algorithm Hash digest
SHA256 f2c6cac55136f0025942132023bd7781ebf38fee1cb1b852ecf9f93a0dbaac5a
MD5 cfee3ac060ab27b26bfdf87e09777c95
BLAKE2b-256 94e89e157c27ea8d888f43d52006d65b47e3fcd91e43785d4cbcfecdb38d2f45

See more details on using hashes here.

File details

Details for the file eeisp-0.6.2-py3-none-any.whl.

File metadata

  • Download URL: eeisp-0.6.2-py3-none-any.whl
  • Upload date:
  • Size: 29.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.13

File hashes

Hashes for eeisp-0.6.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6120bf5d0b4df03a020ccc7bacd7967cf6f7cc96e9ff625291914aa7f5561eb0
MD5 bc5b6bb9c63f1638dfd69ecffc17e2e8
BLAKE2b-256 a47884512747f8f3e551cf9a90893cdbbe831b9bf65427d3d5422fc4cbd67d18

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page