Skip to main content

Scalable gene regulatory network inference using tree-based ensemble regressors

Project description

arboreto Build Status Documentation Status Bioconda package PyPI package

The most satisfactory definition of man from the scientific point of view is probably Man the Tool-maker.

Inferring a gene regulatory network (GRN) from gene expression data is a computationally expensive task, exacerbated by increasing data sizes due to advances in high-throughput gene profiling technology.

The arboreto software library addresses this issue by providing a computational strategy that allows executing the class of GRN inference algorithms exemplified by GENIE3 [1] on hardware ranging from a single computer to a multi-node compute cluster. This class of GRN inference algorithms is defined by a series of steps, one for each target gene in the dataset, where the most important candidates from a set of regulators are determined from a regression model to predict a target gene’s expression profile.

Members of the above class of GRN inference algorithms are attractive from a computational point of view because they are parallelizable by nature. In arboreto, we specify the parallelizable computation as a dask graph [2], a data structure that represents the task schedule of a computation. A dask scheduler assigns the tasks in a dask graph to the available computational resources. Arboreto uses the dask distributed scheduler to spread out the computational tasks over multiple processes running on one or multiple machines.

Arboreto currently supports 2 GRN inference algorithms:

  1. GRNBoost2: a novel and fast GRN inference algorithm using Stochastic Gradient Boosting Machine (SGBM) [3] regression with early-stopping regularization.

  2. GENIE3: the classic GRN inference algorithm using Random Forest (RF) or ExtraTrees (ET) regression.

Get Started

Arboreto was conceived with the working bioinformatician or data scientist in mind. We provide extensive documentation and examples to help you get up to speed with the library.

License

BSD 3-Clause License

pySCENIC

Arboreto is a component in pySCENIC: a lightning-fast python implementation of the SCENIC pipeline [5] (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.

References

  1. Huynh-Thu VA, Irrthum A, Wehenkel L, Geurts P (2010) Inferring Regulatory Networks from Expression Data Using Tree-Based Methods. PLoS ONE

  2. Rocklin, M. (2015). Dask: parallel computation with blocked algorithms and task scheduling. In Proceedings of the 14th Python in Science Conference (pp. 130-136).

  3. Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4), 367-378.

  4. Marbach, D., Costello, J. C., Kuffner, R., Vega, N. M., Prill, R. J., Camacho, D. M., … & Dream5 Consortium. (2012). Wisdom of crowds for robust gene network inference. Nature methods, 9(8), 796-804.

  5. Aibar S, Bravo Gonzalez-Blas C, Moerman T, Wouters J, Huynh-Thu VA, Imrichova H, Kalender Atak Z, Hulselmans G, Dewaele M, Rambow F, Geurts P, Aerts J, Marine C, van den Oord J, Aerts S. SCENIC: Single-cell regulatory network inference and clustering. Nature Methods 14, 1083–1086 (2017). doi: 10.1038/nmeth.4463

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arboreto-0.1.6.tar.gz (14.6 kB view details)

Uploaded Source

Built Distribution

arboreto-0.1.6-py2.py3-none-any.whl (15.5 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file arboreto-0.1.6.tar.gz.

File metadata

  • Download URL: arboreto-0.1.6.tar.gz
  • Upload date:
  • Size: 14.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.6.8

File hashes

Hashes for arboreto-0.1.6.tar.gz
Algorithm Hash digest
SHA256 32fdac5e8a3e0ef2e196b5827f067d815ac4e689d2fca0dc437f42abdeeb89ab
MD5 2dc1577ddbb8cf6fc5416b9fa6d4eca6
BLAKE2b-256 d8b21942195d3848abf64b8115e219c4a530b05798f7332938dfd0e80b93c464

See more details on using hashes here.

File details

Details for the file arboreto-0.1.6-py2.py3-none-any.whl.

File metadata

  • Download URL: arboreto-0.1.6-py2.py3-none-any.whl
  • Upload date:
  • Size: 15.5 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.6.8

File hashes

Hashes for arboreto-0.1.6-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 6c70074b9d7273efaed0f89dd508c886b83c22ef81ae07ca923b7d21e7bbd057
MD5 eb58a93dc468b7145743a477ff5bc9a6
BLAKE2b-256 91268c4a9191c2d31c4f30aecd4382bcc209b67629b827955fb164ce03c09e08

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page