Skip to main content

ABC random forests for model choice and parameter estimation, python wrapper

Project description

PyPI abcranger-build

Random forests methodologies for :

Libraries we use :

As a mention, we use our own implementation of LDA and PLS from (Friedman, Hastie, and Tibshirani 2001, 1:81, 114).

There is one set of binaries, which contains a Macos/Linux/Windows (x64 only) binary for each platform. There are available within the “Releases” tab, under “Assets” section (unfold it to see the list).

This is pure command line binary, and they are no prerequisites or library dependencies in order to run it. Just download them and launch them from your terminal software of choice. The usual caveats with command line executable apply there : if you’re not proficient with the command line interface of your platform, please learn some basics or ask someone who might help you in those matters.

The standalone is part of a specialized Population Genetics graphical interface DIYABC-RF, with a (currently under review) submission to MER (Molecular Ecology Resources), (Francois David Collin et al. 2020).

Python

Installation

pip install pyabcranger

Notebooks examples

Usage

 - ABC Random Forest - Model choice or parameter estimation command line options
Usage:
  ../build/abcranger [OPTION...]

  -h, --header arg        Header file (default: headerRF.txt)
  -r, --reftable arg      Reftable file (default: reftableRF.bin)
  -b, --statobs arg       Statobs file (default: statobsRF.txt)
  -o, --output arg        Prefix output (modelchoice_out or estimparam_out by
                          default)
  -n, --nref arg          Number of samples, 0 means all (default: 0)
  -m, --minnodesize arg   Minimal node size. 0 means 1 for classification or
                          5 for regression (default: 0)
  -t, --ntree arg         Number of trees (default: 500)
  -j, --threads arg       Number of threads, 0 means all (default: 0)
  -s, --seed arg          Seed, generated by default (default: 0)
  -c, --noisecolumns arg  Number of noise columns (default: 5)
      --nolinear          Disable LDA for model choice or PLS for parameter
                          estimation
      --plsmaxvar arg     Percentage of maximum explained Y-variance for
                          retaining pls axis (default: 0.9)
      --chosenscen arg    Chosen scenario (mandatory for parameter
                          estimation)
      --noob arg          number of oob testing samples (mandatory for
                          parameter estimation)
      --parameter arg     name of the parameter of interest (mandatory for
                          parameter estimation)
  -g, --groups arg        Groups of models
      --help              Print help
  • If you provide --chosenscen, --parameter and --noob, parameter estimation mode is selected.
  • Otherwise by default it’s model choice mode.
  • Linear additions are LDA for model choice and PLS for parameter estimation, “–nolinear” options disables them in both case.

Model Choice

Terminal model choice

Example

Example :

abcranger -t 10000 -j 8

Header, reftable and statobs files should be in the current directory.

Groups

With the option -g (or --groups), you may “group” your models in several groups splitted . For example if you have six models, labeled from 1 to 6 `-g “1,2,3;4,5,6”

Generated files

Four files are created :

  • modelchoice_out.ooberror : OOB Error rate vs number of trees (line number is the number of trees)
  • modelchoice_out.importance : variables importance (sorted)
  • modelchoice_out.predictions : votes, prediction and posterior error rate
  • modelchoice_out.confusion : OOB Confusion matrix of the classifier

Parameter Estimation

Terminal estim param

Composite parameters

When specifying the parameter (option --parameter), one may specify simple composite parameters as division, addition or multiplication of two existing parameters. like t/N or T1+T2.

A note about PLS heuristic

The --plsmaxvar option (defaulting at 0.90) fixes the number of selected pls axes so that we get at least the specified percentage of maximum explained variance of the output. The explained variance of the output of the m first axes is defined by the R-squared of the output:

Yvar^m = \frac{\sum_{i=1}^{N}{(\hat{y}^{m}_{i}-\bar{y})^2}}{\sum_{i=1}^{N}{(y_{i}-\hat{y})^2}}

where \hat{y}^{m} is the output Y scored by the pls for the mth component. So, only the n_{comp} first axis are kept, and :

n_{comp} = \underset{Yvar^m \leq{} 0.90*Yvar^M, }{\operatorname{argmax}}

Note that if you specify 0 as --plsmaxvar, an “elbow” heuristic is activiated where the following condition is tested for every computed axis :

\frac{Yvar^{k+1}+Yvar^{k}}{2} \geq 0.99(N-k)\left(Yvar^{k+1}-Yvar^ {k}\right)

If this condition is true for a windows of previous axes, sized to 10% of the total possible axis, then we stop the PLS axis computation.

In practice, we find this n_{heur} close enough to the previous n_{comp} for 99%, but it isn’t guaranteed.

The signification of the noob parameter

The median global/local statistics and confidence intervals (global) measures for parameter estimation need a number of OOB samples (--noob) to be reliable (typlially 30% of the size of the dataset is sufficient). Be aware than computing the whole set (i.e. assigning --noob the same than for --nref) for weights predictions (Raynal et al. 2018) could be very costly, memory and cpu-wise, if your dataset is large in number of samples, so it could be adviseable to compute them for only choose a subset of size noob.

Example (parameter estimation)

Example (working with the dataset in test/data) :

abcranger -t 1000 -j 8 --parameter ra --chosenscen 1 --noob 50

Header, reftable and statobs files should be in the current directory.

Generated files (parameter estimation)

Five files (or seven if pls activated) are created :

  • estimparam_out.ooberror : OOB MSE rate vs number of trees (line number is the number of trees)
  • estimparam_out.importance : variables importance (sorted)
  • estimparam_out.predictions : expectation, variance and 0.05, 0.5, 0.95 quantile for prediction
  • estimparam_out.predweights : csv of the value/weights pairs of the prediction (for density plot)
  • estimparam_out.oobstats : various statistics on oob (MSE, NMSE, NMAE etc.)

if pls enabled :

  • estimparam_out.plsvar : variance explained by number of components
  • estimparam_out.plsweights : variable weight in the first component (sorted by absolute value)

TODO

Input/Output

  • Integrate hdf5 (or exdir? msgpack?) routines to save/load reftables/observed stats with associated metadata
  • Provide R code to save/load the data
  • Provide Python code to save/load the data

C++ standalone

  • Merge the two methodologies in a single executable with the (almost) the same options
  • (Optional) Possibly move to another options parser (CLI?)

External interfaces

  • R package
  • Python package

Documentation

  • Code documentation
  • Document the build

Continuous integration

  • Fix travis build. Currently the vcpkg download of eigen3 head is broken.
  • osX travis build
  • Appveyor win32 build

Long/Mid term TODO

  • methodologies parameters auto-tuning
    • auto-discovering the optimal number of trees by monitoring OOB error
    • auto-limiting number of threads by available memory
  • Streamline the two methodologies (model choice and then parameters estimation)
  • Write our own tree/rf implementation with better storage efficiency than ranger
  • Make functional tests for the two methodologies
  • Possible to use mondrian forests for online batches ? See (Lakshminarayanan, Roy, and Teh 2014)

References

This have been the subject of a proceedings in JOBIM 2020, PDF and video (in french), (François-David Collin et al. 2020).

Collin, Francois David, Ghislain Durif, Louis Raynal, Eric Lombaert, Mathieu Gautier, Renaud Vitalis, Jean Michel Marin, and Arnaud Estoup. 2020. “Extending Approximate Bayesian Computation with Supervised Machine Learning to Infer Demographic History from Genetic Polymorphisms Using DIYABC Random Forest,” July. https://doi.org/10.22541/au.159480722.26357192.

Collin, François-David, Arnaud Estoup, Jean-Michel Marin, and Louis Raynal. 2020. “Bringing ABC inference to the machine learning realm : AbcRanger, an optimized random forests library for ABC.” In JOBIM 2020, 2020:66. JOBIM. Montpellier, France. https://hal.archives-ouvertes.fr/hal-02910067.

Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. 2001. The Elements of Statistical Learning. Vol. 1. 10. Springer series in statistics New York, NY, USA:

Guennebaud, Gaël, Benoît Jacob, and others. 2010. “Eigen V3.” http://eigen.tuxfamily.org.

Lakshminarayanan, Balaji, Daniel M Roy, and Yee Whye Teh. 2014. “Mondrian Forests: Efficient Online Random Forests.” In Advances in Neural Information Processing Systems, 3140–48.

Pudlo, Pierre, Jean-Michel Marin, Arnaud Estoup, Jean-Marie Cornuet, Mathieu Gautier, and Christian P Robert. 2015. “Reliable ABC Model Choice via Random Forests.” Bioinformatics 32 (6): 859–66.

Raynal, Louis, Jean-Michel Marin, Pierre Pudlo, Mathieu Ribatet, Christian P Robert, and Arnaud Estoup. 2018. “ABC random forests for Bayesian parameter inference.” Bioinformatics 35 (10): 1720–28. https://doi.org/10.1093/bioinformatics/bty867.

Wright, Marvin N, and Andreas Ziegler. 2015. “Ranger: A Fast Implementation of Random Forests for High Dimensional Data in c++ and r.” arXiv Preprint arXiv:1508.04409.

[1] The term “online” there and in the code has not the usual meaning it has, as coined in “online machine learning.” We still need the entire training data set at once. Our implementation is an “online” one not by the sequential order of the input data, but by the sequential order of computation of the trees in random forests, sequentially computed and then discarded.

[2] We only use the C++ Core of ranger, which is under MIT License, same as ours.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyabcranger-0.0.49.tar.gz (53.0 kB view details)

Uploaded Source

Built Distributions

pyabcranger-0.0.49-cp39-cp39-win_amd64.whl (649.1 kB view details)

Uploaded CPython 3.9 Windows x86-64

pyabcranger-0.0.49-cp39-cp39-macosx_10_9_x86_64.whl (546.4 kB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

pyabcranger-0.0.49-cp38-cp38-win_amd64.whl (649.0 kB view details)

Uploaded CPython 3.8 Windows x86-64

pyabcranger-0.0.49-cp38-cp38-macosx_10_9_x86_64.whl (546.3 kB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

pyabcranger-0.0.49-cp37-cp37m-win_amd64.whl (649.5 kB view details)

Uploaded CPython 3.7m Windows x86-64

pyabcranger-0.0.49-cp37-cp37m-macosx_10_9_x86_64.whl (544.7 kB view details)

Uploaded CPython 3.7m macOS 10.9+ x86-64

pyabcranger-0.0.49-cp36-cp36m-win_amd64.whl (649.5 kB view details)

Uploaded CPython 3.6m Windows x86-64

pyabcranger-0.0.49-cp36-cp36m-macosx_10_9_x86_64.whl (544.6 kB view details)

Uploaded CPython 3.6m macOS 10.9+ x86-64

File details

Details for the file pyabcranger-0.0.49.tar.gz.

File metadata

  • Download URL: pyabcranger-0.0.49.tar.gz
  • Upload date:
  • Size: 53.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.7

File hashes

Hashes for pyabcranger-0.0.49.tar.gz
Algorithm Hash digest
SHA256 fcdacc119abd95515ebd1be46ab0af37df8ede56cbaa7cfaf426657994517c81
MD5 553c1904749ecc4326366187b1aa9a90
BLAKE2b-256 8cdaae7d353aa460b945b4384eb9619579476e08857d9e671fe645fde8dde1b8

See more details on using hashes here.

File details

Details for the file pyabcranger-0.0.49-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: pyabcranger-0.0.49-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 649.1 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.7

File hashes

Hashes for pyabcranger-0.0.49-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 118208dd83e69fbc5ba6667d4b0b97c9d552139508cbb4f0c7c069468418d206
MD5 6f8132eb59d87cab99f4dea8e0790420
BLAKE2b-256 c06c1c80a118e4b87e22e28d0ded7d497c3095a0f5d0ac56fc3f980b368fefca

See more details on using hashes here.

File details

Details for the file pyabcranger-0.0.49-cp39-cp39-manylinux2014_x86_64.whl.

File metadata

  • Download URL: pyabcranger-0.0.49-cp39-cp39-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 3.4 MB
  • Tags: CPython 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.7

File hashes

Hashes for pyabcranger-0.0.49-cp39-cp39-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 80f346d75060e128ada023d999f768c58eb1bf582ebef3dbe12ccf38e996fc9a
MD5 d41b19b0d7dadd153f11441df7a1c0c9
BLAKE2b-256 9e0c0d4471d055891a70e6e727f7f288d053ac1fad7e57a9f8c55aafc6ec398e

See more details on using hashes here.

File details

Details for the file pyabcranger-0.0.49-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: pyabcranger-0.0.49-cp39-cp39-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 546.4 kB
  • Tags: CPython 3.9, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.7

File hashes

Hashes for pyabcranger-0.0.49-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 cede0864757888cda78736055e3701e5b391f0a051afe8bc964d886ca53caa3b
MD5 02717be94ac09a7a678a9d6b583fea8e
BLAKE2b-256 2e2a6a99def587b44116395d328f7dd96c162740f6082c8843f0cde5d18f9ad8

See more details on using hashes here.

File details

Details for the file pyabcranger-0.0.49-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: pyabcranger-0.0.49-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 649.0 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.7

File hashes

Hashes for pyabcranger-0.0.49-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 1ee2872553730db91d77054adf57a6a99c9d2e2073f0de8eb37129f2675ea750
MD5 a2cba312ec9efc2c857292886160adac
BLAKE2b-256 379228111cd7d3eea7ff1e608c26757ebfb72a9188c3f7bc5ec63a153ab7cf99

See more details on using hashes here.

File details

Details for the file pyabcranger-0.0.49-cp38-cp38-manylinux2014_x86_64.whl.

File metadata

  • Download URL: pyabcranger-0.0.49-cp38-cp38-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 3.4 MB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.7

File hashes

Hashes for pyabcranger-0.0.49-cp38-cp38-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 89bc9f78f51df8ee802574c954e4d358df92e1b0b3e05b038543be2b9082511f
MD5 1e2b1f6676e37d800c26d7a1c80a410f
BLAKE2b-256 941172bcc884f4360130544405013d7ffa7ae9b4c2bd64c4d249c3051099eff9

See more details on using hashes here.

File details

Details for the file pyabcranger-0.0.49-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: pyabcranger-0.0.49-cp38-cp38-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 546.3 kB
  • Tags: CPython 3.8, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.7

File hashes

Hashes for pyabcranger-0.0.49-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 7512b1fc8bfae31ac389c5a6f6fd99153f2ec3346d79fa7ac50870678483cdf0
MD5 b4c23b4f286d361b6f21a4bcf5a08c40
BLAKE2b-256 fcc413e23a9f4aaf22c39736c704517d2e4faffd914b511ed2bdbde05c5f7ca5

See more details on using hashes here.

File details

Details for the file pyabcranger-0.0.49-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: pyabcranger-0.0.49-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 649.5 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.7

File hashes

Hashes for pyabcranger-0.0.49-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 70b84325d8012aae49f901e281069c7b68874edd1617f78e4ef0c4fcbd23dbf6
MD5 c463c28dadd898a44d4fa75884a1bf01
BLAKE2b-256 5d715fb4133e59a4b8d4a9308ba65271faeaa126353d4e51799494f199289823

See more details on using hashes here.

File details

Details for the file pyabcranger-0.0.49-cp37-cp37m-manylinux2014_x86_64.whl.

File metadata

  • Download URL: pyabcranger-0.0.49-cp37-cp37m-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 3.4 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.7

File hashes

Hashes for pyabcranger-0.0.49-cp37-cp37m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 22cb09fde5718c3f18fc8ad619a1f9d1c020bea4e9865f8013b98c824d9258ab
MD5 a438a42c8f30e3c8d0d81bf2a7da998b
BLAKE2b-256 290a044b2ba24cfabcde7bf7a3f1748774319e672cbb08f65df1b998b4a264a7

See more details on using hashes here.

File details

Details for the file pyabcranger-0.0.49-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: pyabcranger-0.0.49-cp37-cp37m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 544.7 kB
  • Tags: CPython 3.7m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.7

File hashes

Hashes for pyabcranger-0.0.49-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 54092fe48d2cd770c1e2493583003d8a4a477d4ee13714e10d1d24ac3f5d627e
MD5 900f5875c579c79075424a0e2ac91216
BLAKE2b-256 06cd6ddeaa7bfe3a3469d606730d536cb4c5433109326dfb98d37e594ca14c1c

See more details on using hashes here.

File details

Details for the file pyabcranger-0.0.49-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: pyabcranger-0.0.49-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 649.5 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.7

File hashes

Hashes for pyabcranger-0.0.49-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 f68f83a9347739831c8820bd2c41d029e357ea8a92b76f4d26657a8c7794fd4f
MD5 fbf2dbebbce403c8e08c1af6b855f999
BLAKE2b-256 a8f4c20bf726bfa5397d7b832846aa02f41cde1c64346f438abe17ba819028f0

See more details on using hashes here.

File details

Details for the file pyabcranger-0.0.49-cp36-cp36m-manylinux2014_x86_64.whl.

File metadata

  • Download URL: pyabcranger-0.0.49-cp36-cp36m-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 3.4 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.7

File hashes

Hashes for pyabcranger-0.0.49-cp36-cp36m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 921363330df5c5eddd3c5c31dfc0baf132e8c07cd47b69ae7ce0ef0cadc97072
MD5 14eab5e7099ce39a12180beccf485e36
BLAKE2b-256 4ef910ab438db68ea2c6c906576a5a0932c4e66da56c0775f0989f4e14a539a4

See more details on using hashes here.

File details

Details for the file pyabcranger-0.0.49-cp36-cp36m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: pyabcranger-0.0.49-cp36-cp36m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 544.6 kB
  • Tags: CPython 3.6m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.7

File hashes

Hashes for pyabcranger-0.0.49-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 7d205db9695d62e5f8ce343fabd8c0f7c208866fcf8c2f6f3b13fa46eb1b5e31
MD5 8d947747aba66ef89ce1f1ce7c8c65c8
BLAKE2b-256 ca0c17453259f59fe71271740e3a4aae4c9a50dd5b5586a8571ee1e576627dd0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page