arboreto

Scalable gene regulatory network inference using tree-based ensemble regressors

These details have not been verified by PyPI

Project links

Homepage

Project description

.. image:: img/arboreto.png
:alt: arboreto
:scale: 100%
:align: left

.. image:: https://travis-ci.com/aertslab/arboreto.svg?branch=master
:alt: Build Status
:target: https://travis-ci.com/aertslab/arboreto

.. image:: https://readthedocs.org/projects/arboreto/badge/?version=latest
:alt: Documentation Status
:target: http://arboreto.readthedocs.io/en/latest/?badge=latest

.. image:: https://anaconda.org/bioconda/arboreto/badges/version.svg
:alt: Bioconda package
:target: https://anaconda.org/bioconda/arboreto

.. image:: https://img.shields.io/pypi/v/arboreto
:alt: PyPI package
:target: https://pypi.org/project/arboreto/

----

.. epigraph::

*The most satisfactory definition of man from the scientific point of view is probably Man the Tool-maker.*

.. _arboreto: https://arboreto.readthedocs.io
.. _`arboreto documentation`: https://arboreto.readthedocs.io
.. _notebooks: https://github.com/tmoerman/arboreto/tree/master/notebooks
.. _issue: https://github.com/tmoerman/arboreto/issues/new

.. _dask: https://dask.pydata.org/en/latest/
.. _`dask distributed`: https://distributed.readthedocs.io/en/latest/

.. _GENIE3: http://www.montefiore.ulg.ac.be/~huynh-thu/GENIE3.html
.. _`Random Forest`: https://en.wikipedia.org/wiki/Random_forest
.. _ExtraTrees: https://en.wikipedia.org/wiki/Random_forest#ExtraTrees
.. _`Stochastic Gradient Boosting Machine`: https://en.wikipedia.org/wiki/Gradient_boosting#Stochastic_gradient_boosting
.. _`early-stopping`: https://en.wikipedia.org/wiki/Early_stopping

Inferring a gene regulatory network (GRN) from gene expression data is a computationally expensive task, exacerbated by increasing data sizes due to advances
in high-throughput gene profiling technology.

The arboreto_ software library addresses this issue by providing a computational strategy that allows executing the class of GRN inference algorithms
exemplified by GENIE3_ [1] on hardware ranging from a single computer to a multi-node compute cluster. This class of GRN inference algorithms is defined by
a series of steps, one for each target gene in the dataset, where the most important candidates from a set of regulators are determined from a regression
model to predict a target gene's expression profile.

Members of the above class of GRN inference algorithms are attractive from a computational point of view because they are parallelizable by nature. In arboreto,
we specify the parallelizable computation as a dask_ graph [2], a data structure that represents the task schedule of a computation. A dask scheduler assigns the
tasks in a dask graph to the available computational resources. Arboreto uses the `dask distributed`_ scheduler to
spread out the computational tasks over multiple processes running on one or multiple machines.

Arboreto currently supports 2 GRN inference algorithms:

1. **GRNBoost2**: a novel and fast GRN inference algorithm using `Stochastic Gradient Boosting Machine`_ (SGBM) [3] regression with `early-stopping`_ regularization.
2. **GENIE3**: the classic GRN inference algorithm using `Random Forest`_ (RF) or ExtraTrees_ (ET) regression.

Get Started
***********

Arboreto was conceived with the working bioinformatician or data scientist in mind. We provide extensive documentation and examples to help you get up to speed with the library.

* Read the `arboreto documentation`_.
* Browse example notebooks_.
* Report an issue_.

License
*******

BSD 3-Clause License

pySCENIC
========

.. _pySCENIC: https://github.com/aertslab/pySCENIC
.. _SCENIC: https://aertslab.org/#scenic

Arboreto is a component in pySCENIC_: a lightning-fast python implementation of
the SCENIC_ pipeline [5] (Single-Cell rEgulatory Network Inference and Clustering)
which enables biologists to infer transcription factors, gene regulatory networks
and cell types from single-cell RNA-seq data.

References
**********

1. Huynh-Thu VA, Irrthum A, Wehenkel L, Geurts P (2010) Inferring Regulatory Networks from Expression Data Using Tree-Based Methods. PLoS ONE
2. Rocklin, M. (2015). Dask: parallel computation with blocked algorithms and task scheduling. In Proceedings of the 14th Python in Science Conference (pp. 130-136).
3. Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4), 367-378.
4. Marbach, D., Costello, J. C., Kuffner, R., Vega, N. M., Prill, R. J., Camacho, D. M., ... & Dream5 Consortium. (2012). Wisdom of crowds for robust gene network inference. Nature methods, 9(8), 796-804.
5. Aibar S, Bravo Gonzalez-Blas C, Moerman T, Wouters J, Huynh-Thu VA, Imrichova H, Kalender Atak Z, Hulselmans G, Dewaele M, Rambow F, Geurts P, Aerts J, Marine C, van den Oord J, Aerts S. SCENIC: Single-cell regulatory network inference and clustering. Nature Methods 14, 1083–1086 (2017). doi: 10.1038/nmeth.4463

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.6

Feb 9, 2021

0.1.5

Jun 11, 2018

0.1.4

Jun 5, 2018

0.1.3

Jun 1, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arboreto-0.1.6.tar.gz (14.6 kB view details)

Uploaded Feb 9, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

arboreto-0.1.6-py2.py3-none-any.whl (15.5 kB view details)

Uploaded Feb 9, 2021 Python 2Python 3

File details

Details for the file arboreto-0.1.6.tar.gz.

File metadata

Download URL: arboreto-0.1.6.tar.gz
Upload date: Feb 9, 2021
Size: 14.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.6.8

File hashes

Hashes for arboreto-0.1.6.tar.gz
Algorithm	Hash digest
SHA256	`32fdac5e8a3e0ef2e196b5827f067d815ac4e689d2fca0dc437f42abdeeb89ab`
MD5	`2dc1577ddbb8cf6fc5416b9fa6d4eca6`
BLAKE2b-256	`d8b21942195d3848abf64b8115e219c4a530b05798f7332938dfd0e80b93c464`

See more details on using hashes here.

File details

Details for the file arboreto-0.1.6-py2.py3-none-any.whl.

File metadata

Download URL: arboreto-0.1.6-py2.py3-none-any.whl
Upload date: Feb 9, 2021
Size: 15.5 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.6.8

File hashes

Hashes for arboreto-0.1.6-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`6c70074b9d7273efaed0f89dd508c886b83c22ef81ae07ca923b7d21e7bbd057`
MD5	`eb58a93dc468b7145743a477ff5bc9a6`
BLAKE2b-256	`91268c4a9191c2d31c4f30aecd4382bcc209b67629b827955fb164ce03c09e08`

See more details on using hashes here.

arboreto 0.1.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes