Skip to main content

PROFILE methodology for the binarisation and normalisation of RNA-seq data

Project description

profile_binr

The PROFILE methodology for the binarisation and normalisation of RNA-seq data.

This is a Python interface to a set of normalisation and binarisation functions for RNA-seq data originally written in R.

This software package is based on the methodology developed by Beal, Jonas; Montagud, Arnau; Traynard, Pauline; Barillot, Emmanuel; and Calzone, Laurence at Computational Systems Biology of Cancer team at Institut Curie (contact-sysbio@curie.fr). It generalizes and offers a Python interface of the original implementation in Rmarkdown notebooks available at https://github.com/sysbio-curie/PROFILE.

Installation

Using conda

The tool can be installed using the Conda package profile_binr in the colomoto channel. Note that some of its dependencies requires the conda-forge channel.

conda install -c conda-forge colomoto::profile_binr

Using pip

Requirements

  • R (≥4.0)
  • R packages:
    • mclust
    • diptest
    • moments
    • magrittr
    • tidyr
    • dplyr
    • tibble
    • bigmemory
    • doSNOW
    • foreach
    • glue
pip install profile_binr

Usage

Once again this is a minimal example :

from profile_binr import ProfileBin
import pandas as pd

# your data is assumed to contain observations as
# rows and genes as columns
data = pd.read_csv("path/to/your/data.csv")
data.head()
Clec1b Kdm3a Coro2b 8430408G22Rik Clec9a Phf6 Usp14 Tmem167b
cell_id
HSPC_025 0.0 4.891604 1.426148 0.0 0.0 2.599758 2.954035 6.357369
HSPC_031 0.0 6.877725 0.000000 0.0 0.0 2.423483 1.804914 0.000000
HSPC_037 0.0 0.000000 6.913384 0.0 0.0 2.051659 8.265465 0.000000
LT-HSC_001 0.0 0.000000 8.178374 0.0 0.0 6.419817 3.453502 2.579528
HSPC_001 0.0 0.000000 9.475577 0.0 0.0 7.733370 1.478900 0.000000
# create the binarisation instance using the dataframe
# with the index containing the cell identifier
# and the columns being the gene names
probin = ProfileBin(data)

# compute the criteria used to binarise/normalise the data :
# This method uses a parallel implementation, you can specify the 
# number of workers with an integer
probin.fit(8) # train using 8 threads

# Look at the computed criteria
probin.criteria.head(8)
Dip BI Kurtosis DropOutRate MeanNZ DenPeak Amplitude Category
Clec1b 0.358107 1.635698 54.017736 0.876208 1.520978 -0.007249 8.852181 ZeroInf
Kdm3a 0.000000 2.407548 -0.784019 0.326087 3.847940 0.209239 10.126676 Bimodal
Coro2b 0.000000 2.320060 7.061604 0.658213 2.383819 0.004597 9.475577 ZeroInf
8430408G22Rik 0.684454 3.121069 21.729044 0.884058 2.983472 0.005663 9.067857 ZeroInf
Clec9a 1.000000 2.081717 140.089285 0.965580 2.280293 -0.009361 9.614233 Discarded
Phf6 0.000000 1.988667 -1.389024 0.035628 5.025501 2.017547 10.135226 Bimodal
Usp14 0.000000 2.208080 -1.224987 0.007850 6.109964 8.245570 11.088750 Bimodal
Tmem167b 0.000000 2.430813 0.093023 0.393720 3.448331 0.072982 9.486826 Bimodal
# get binarised data (alternatively .binarise()):
my_bin = probin.binarize()
my_bin.head()
Clec1b Kdm3a Coro2b 8430408G22Rik Clec9a Phf6 Usp14 Tmem167b
HSPC_025 NaN 1.0 NaN NaN NaN 0.0 0.0 1.0
HSPC_031 NaN 1.0 NaN NaN NaN 0.0 0.0 0.0
HSPC_037 NaN 0.0 1.0 NaN NaN 0.0 1.0 0.0
LT-HSC_001 NaN 0.0 1.0 NaN NaN 1.0 0.0 0.0
HSPC_001 NaN 0.0 1.0 NaN NaN 1.0 0.0 0.0
# idem for normalised data :
my_norm = probin.normalize()
my_norm.head()
Clec1b Kdm3a Coro2b 8430408G22Rik Clec9a Phf6 Usp14 Tmem167b
HSPC_025 0.0 9.786196e-01 0.184102 0.0 NaN 0.000801 8.318176e-05 9.999970e-01
HSPC_031 0.0 9.999981e-01 0.000000 0.0 NaN 0.000462 8.084114e-07 6.874397e-11
HSPC_037 0.0 4.408417e-09 0.892449 0.0 NaN 0.000145 9.999940e-01 6.874397e-11
LT-HSC_001 0.0 4.408417e-09 1.000000 0.0 NaN 0.991865 6.230178e-04 1.599753e-04
HSPC_001 0.0 4.408417e-09 1.000000 0.0 NaN 0.999865 2.171153e-07 6.874397e-11

References

  • Béal J, Montagud A, Traynard P, Barillot E and Calzone L (2019) Personalization of Logical Models With Multi-Omics Data Allows Clinical Stratification of Patients. Front. Physiol. 9:1965. doi:10.3389/fphys.2018.01965

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

profile_binr-0.1.2.tar.gz (15.6 kB view hashes)

Uploaded Source

Built Distribution

profile_binr-0.1.2-py3-none-any.whl (14.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page