Skip to main content

ABC random forests for model choice and parameter estimation, python wrapper

Project description

PyPI abcranger-build

Random forests methodologies for :

Libraries we use :

As a mention, we use our own implementation of LDA and PLS from (Friedman, Hastie, and Tibshirani 2001, 1:81, 114).

There is one set of binaries, which contains a Macos/Linux/Windows (x64 only) binary for each platform. There are available within the “Releases” tab, under “Assets” section (unfold it to see the list).

This is pure command line binary, and they are no prerequisites or library dependencies in order to run it. Just download them and launch them from your terminal software of choice. The usual caveats with command line executable apply there : if you’re not proficient with the command line interface of your platform, please learn some basics or ask someone who might help you in those matters.

The standalone is part of a specialized Population Genetics graphical interface DIYABC-RF, with a (currently under review) submission to MER (Molecular Ecology Resources), (Collin, Durif, et al. 2020).

Python

Installation

pip install pyabcranger

Notebooks examples

Usage

 - ABC Random Forest - Model choice or parameter estimation command line options
Usage:
  ../build/abcranger [OPTION...]

  -h, --header arg        Header file (default: headerRF.txt)
  -r, --reftable arg      Reftable file (default: reftableRF.bin)
  -b, --statobs arg       Statobs file (default: statobsRF.txt)
  -o, --output arg        Prefix output (modelchoice_out or estimparam_out by
                          default)
  -n, --nref arg          Number of samples, 0 means all (default: 0)
  -m, --minnodesize arg   Minimal node size. 0 means 1 for classification or
                          5 for regression (default: 0)
  -t, --ntree arg         Number of trees (default: 500)
  -j, --threads arg       Number of threads, 0 means all (default: 0)
  -s, --seed arg          Seed, generated by default (default: 0)
  -c, --noisecolumns arg  Number of noise columns (default: 5)
      --nolinear          Disable LDA for model choice or PLS for parameter
                          estimation
      --plsmaxvar arg     Percentage of maximum explained Y-variance for
                          retaining pls axis (default: 0.9)
      --chosenscen arg    Chosen scenario (mandatory for parameter
                          estimation)
      --noob arg          number of oob testing samples (mandatory for
                          parameter estimation)
      --parameter arg     name of the parameter of interest (mandatory for
                          parameter estimation)
  -g, --groups arg        Groups of models
      --help              Print help
  • If you provide --chosenscen, --parameter and --noob, parameter estimation mode is selected.
  • Otherwise by default it’s model choice mode.
  • Linear additions are LDA for model choice and PLS for parameter estimation, “–nolinear” options disables them in both case.

Model Choice

Terminal model choice

Example

Example :

abcranger -t 10000 -j 8

Header, reftable and statobs files should be in the current directory.

Groups

With the option -g (or --groups), you may “group” your models in several groups splitted . For example if you have six models, labeled from 1 to 6 `-g “1,2,3;4,5,6”

Generated files

Four files are created :

  • modelchoice_out.ooberror : OOB Error rate vs number of trees (line number is the number of trees)
  • modelchoice_out.importance : variables importance (sorted)
  • modelchoice_out.predictions : votes, prediction and posterior error rate
  • modelchoice_out.confusion : OOB Confusion matrix of the classifier

Parameter Estimation

Terminal estim param

Composite parameters

When specifying the parameter (option --parameter), one may specify simple composite parameters as division, addition or multiplication of two existing parameters. like t/N or T1+T2.

A note about PLS heuristic

The --plsmaxvar option (defaulting at 0.90) fixes the number of selected pls axes so that we get at least the specified percentage of maximum explained variance of the output. The explained variance of the output of the m first axes is defined by the R-squared of the output:

Yvar^m = \frac{\sum_{i=1}^{N}{(\hat{y}^{m}_{i}-\bar{y})^2}}{\sum_{i=1}^{N}{(y_{i}-\hat{y})^2}}

where \hat{y}^{m} is the output Y scored by the pls for the mth component. So, only the n_{comp} first axis are kept, and :

n_{comp} = \underset{Yvar^m \leq{} 0.90*Yvar^M, }{\operatorname{argmax}}

Note that if you specify 0 as --plsmaxvar, an “elbow” heuristic is activiated where the following condition is tested for every computed axis :

\frac{Yvar^{k+1}+Yvar^{k}}{2} \geq 0.99(N-k)\left(Yvar^{k+1}-Yvar^ {k}\right)

If this condition is true for a windows of previous axes, sized to 10% of the total possible axis, then we stop the PLS axis computation.

In practice, we find this n_{heur} close enough to the previous n_{comp} for 99%, but it isn’t guaranteed.

The signification of the noob parameter

The median global/local statistics and confidence intervals (global) measures for parameter estimation need a number of OOB samples (--noob) to be reliable (typlially 30% of the size of the dataset is sufficient). Be aware than computing the whole set (i.e. assigning --noob the same than for --nref) for weights predictions (Raynal et al. 2018) could be very costly, memory and cpu-wise, if your dataset is large in number of samples, so it could be adviseable to compute them for only choose a subset of size noob.

Example (parameter estimation)

Example (working with the dataset in test/data) :

abcranger -t 1000 -j 8 --parameter ra --chosenscen 1 --noob 50

Header, reftable and statobs files should be in the current directory.

Generated files (parameter estimation)

Five files (or seven if pls activated) are created :

  • estimparam_out.ooberror : OOB MSE rate vs number of trees (line number is the number of trees)
  • estimparam_out.importance : variables importance (sorted)
  • estimparam_out.predictions : expectation, variance and 0.05, 0.5, 0.95 quantile for prediction
  • estimparam_out.predweights : csv of the value/weights pairs of the prediction (for density plot)
  • estimparam_out.oobstats : various statistics on oob (MSE, NMSE, NMAE etc.)

if pls enabled :

  • estimparam_out.plsvar : variance explained by number of components
  • estimparam_out.plsweights : variable weight in the first component (sorted by absolute value)

TODO

Input/Output

  • Integrate hdf5 (or exdir? msgpack?) routines to save/load reftables/observed stats with associated metadata
  • Provide R code to save/load the data
  • Provide Python code to save/load the data

C++ standalone

  • Merge the two methodologies in a single executable with the (almost) the same options
  • (Optional) Possibly move to another options parser (CLI?)

External interfaces

  • R package
  • Python package

Documentation

  • Code documentation
  • Document the build

Continuous integration

  • Fix travis build. Currently the vcpkg download of eigen3 head is broken.
  • osX travis build
  • Appveyor win32 build

Long/Mid term TODO

  • methodologies parameters auto-tuning
    • auto-discovering the optimal number of trees by monitoring OOB error
    • auto-limiting number of threads by available memory
  • Streamline the two methodologies (model choice and then parameters estimation)
  • Write our own tree/rf implementation with better storage efficiency than ranger
  • Make functional tests for the two methodologies
  • Possible to use mondrian forests for online batches ? See (Lakshminarayanan, Roy, and Teh 2014)

References

This have been the subject of a proceedings in JOBIM 2020, PDF and video (in french), (Collin, Estoup, et al. 2020).

Collin, François-David, Ghislain Durif, Louis Raynal, Eric Lombaert, Mathieu Gautier, Renaud Vitalis, Jean Michel Marin, and Arnaud Estoup. 2020. “Extending Approximate Bayesian Computation with Supervised Machine Learning to Infer Demographic History from Genetic Polymorphisms Using DIYABC Random Forest,” July. https://doi.org/10.22541/au.159480722.26357192.

Collin, François-David, Arnaud Estoup, Jean-Michel Marin, and Louis Raynal. 2020. “Bringing ABC inference to the machine learning realm : AbcRanger, an optimized random forests library for ABC.” In JOBIM 2020, 2020:66. JOBIM. Montpellier, France. https://hal.archives-ouvertes.fr/hal-02910067.

Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. 2001. The Elements of Statistical Learning. Vol. 1. 10. Springer series in statistics New York, NY, USA:

Guennebaud, Gaël, Benoît Jacob, and others. 2010. “Eigen V3.” http://eigen.tuxfamily.org.

Lakshminarayanan, Balaji, Daniel M Roy, and Yee Whye Teh. 2014. “Mondrian Forests: Efficient Online Random Forests.” In Advances in Neural Information Processing Systems, 3140–48.

Pudlo, Pierre, Jean-Michel Marin, Arnaud Estoup, Jean-Marie Cornuet, Mathieu Gautier, and Christian P Robert. 2015. “Reliable ABC Model Choice via Random Forests.” Bioinformatics 32 (6): 859–66.

Raynal, Louis, Jean-Michel Marin, Pierre Pudlo, Mathieu Ribatet, Christian P Robert, and Arnaud Estoup. 2018. “ABC random forests for Bayesian parameter inference.” Bioinformatics 35 (10): 1720–28. https://doi.org/10.1093/bioinformatics/bty867.

Wright, Marvin N, and Andreas Ziegler. 2015. “Ranger: A Fast Implementation of Random Forests for High Dimensional Data in c++ and r.” arXiv Preprint arXiv:1508.04409.

[1] The term “online” there and in the code has not the usual meaning it has, as coined in “online machine learning.” We still need the entire training data set at once. Our implementation is an “online” one not by the sequential order of the input data, but by the sequential order of computation of the trees in random forests, sequentially computed and then discarded.

[2] We only use the C++ Core of ranger, which is under MIT License, same as ours.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyabcranger-0.0.52.tar.gz (51.6 kB view details)

Uploaded Source

Built Distributions

pyabcranger-0.0.52-cp39-cp39-win_amd64.whl (603.7 kB view details)

Uploaded CPython 3.9 Windows x86-64

pyabcranger-0.0.52-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.7 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

pyabcranger-0.0.52-cp39-cp39-macosx_10_9_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

pyabcranger-0.0.52-cp38-cp38-win_amd64.whl (605.2 kB view details)

Uploaded CPython 3.8 Windows x86-64

pyabcranger-0.0.52-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.7 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

pyabcranger-0.0.52-cp38-cp38-macosx_10_9_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

pyabcranger-0.0.52-cp37-cp37m-win_amd64.whl (605.6 kB view details)

Uploaded CPython 3.7m Windows x86-64

pyabcranger-0.0.52-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.7 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

pyabcranger-0.0.52-cp37-cp37m-macosx_10_9_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.7m macOS 10.9+ x86-64

pyabcranger-0.0.52-cp36-cp36m-win_amd64.whl (605.6 kB view details)

Uploaded CPython 3.6m Windows x86-64

pyabcranger-0.0.52-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.7 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.17+ x86-64

pyabcranger-0.0.52-cp36-cp36m-macosx_10_9_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.6m macOS 10.9+ x86-64

File details

Details for the file pyabcranger-0.0.52.tar.gz.

File metadata

  • Download URL: pyabcranger-0.0.52.tar.gz
  • Upload date:
  • Size: 51.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for pyabcranger-0.0.52.tar.gz
Algorithm Hash digest
SHA256 de8517eb639e8c10eee284144aa9a60ef2ea6696859ee67497622ef55f3ece15
MD5 77ece3b01316d97505f7a39683add2ce
BLAKE2b-256 8d523bac089aec171e26f30d32e4ee1b37a89d212500f75b5ce7107217c3000e

See more details on using hashes here.

File details

Details for the file pyabcranger-0.0.52-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: pyabcranger-0.0.52-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 603.7 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for pyabcranger-0.0.52-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 3f3dcacf67f2af25cfb7f05f772c4bd2991e8f49fce63fd299f32a15e1db6b3f
MD5 7252f524b84d1b742f30f64299cedd57
BLAKE2b-256 8249cc6557e9bc01f23ce0b059cdaf63a99881a37bfa514fa32bf09e4c1960a6

See more details on using hashes here.

File details

Details for the file pyabcranger-0.0.52-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyabcranger-0.0.52-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 bda7982563d9fbb6a06cba48e5ff2815b4ff784817e9db287f0edc833a1e3ea8
MD5 f95637ad0723091ebb5199aea50d5929
BLAKE2b-256 11f4b0fc75f42c1f60ddedd578e800296c6f2015ac3a80030c9f28453a296fdd

See more details on using hashes here.

File details

Details for the file pyabcranger-0.0.52-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: pyabcranger-0.0.52-cp39-cp39-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: CPython 3.9, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for pyabcranger-0.0.52-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 c3390f5b375dccf55798c15bbe1079b8d22d8a6d9155e432ca6d6811c363bee3
MD5 3343bc0d6c1ef5e00a7fbf833faef252
BLAKE2b-256 378ef52dd5416dc336fe194ffd72d05147a6d852f0c3c9422691ccb5f844a0cf

See more details on using hashes here.

File details

Details for the file pyabcranger-0.0.52-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: pyabcranger-0.0.52-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 605.2 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for pyabcranger-0.0.52-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 12def455e1d44ba86fdb4eafb8067ba6e8900d749cb507fd07d1c344cf449df2
MD5 67b49108796d335f8da36d3b151c046e
BLAKE2b-256 6d89d3ce641c3f1edd7bd19207ed05a19112bcace12b303a27df88c5e7483291

See more details on using hashes here.

File details

Details for the file pyabcranger-0.0.52-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyabcranger-0.0.52-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b80a69ac34a3410c8fb8615852270c92b9953fe5e9f640df6ba9b3d1f51ae197
MD5 47783d59f67e702f2f6d9a67073ee1a2
BLAKE2b-256 cd64d38bed1187471c35b072a410b540ff15ce1f23c1921bcd001f01bbd9304c

See more details on using hashes here.

File details

Details for the file pyabcranger-0.0.52-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: pyabcranger-0.0.52-cp38-cp38-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: CPython 3.8, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for pyabcranger-0.0.52-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 afff2f855c94691cdb4761c059a29421617713ac1241b725727cd3e9782e33ca
MD5 0964459b54223b1b8e4b54eaffa8e87a
BLAKE2b-256 9584539f0ea2c4bc58f1d54c9954940bd6e6932ef006c1c7c8c77508193fc008

See more details on using hashes here.

File details

Details for the file pyabcranger-0.0.52-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: pyabcranger-0.0.52-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 605.6 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for pyabcranger-0.0.52-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 d1b7d538c77f79ae9529b5336fec8baf91bffc54b7a4bf293968142f38a13159
MD5 89dfd6b1f9d33c73122cb67caca1b3a5
BLAKE2b-256 c331aca9e19469884df5740a3442886094441bc74493b53b1ffeb90d8f3c2225

See more details on using hashes here.

File details

Details for the file pyabcranger-0.0.52-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyabcranger-0.0.52-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6edc82f371f2f3f1e7380ede6af7d4a07856fe2b615d416105d5b1d52a0c32dd
MD5 4549e4be27cc9e3237000514e79fb7b9
BLAKE2b-256 8405590de00bae8113820ed69cf35e5671c6ed57ce4cb54502e35e9266ea2edc

See more details on using hashes here.

File details

Details for the file pyabcranger-0.0.52-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: pyabcranger-0.0.52-cp37-cp37m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: CPython 3.7m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for pyabcranger-0.0.52-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 68b76f64604cac26dcb89c3892312221d2d8c3e0787997949975ea1ff9517879
MD5 93c71c1d5670cb683021fe258cab8c2b
BLAKE2b-256 6764c4eb322f0c747a559a93b7e49db9e28e4a4a254d20b9ebd2d913415ed9c8

See more details on using hashes here.

File details

Details for the file pyabcranger-0.0.52-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: pyabcranger-0.0.52-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 605.6 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for pyabcranger-0.0.52-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 bd4079f909e04ce8b2dd2d4917f3e6d4a544ef9e06c489be59b0fecd309140fa
MD5 2fddf8907e164f36866b785662f62889
BLAKE2b-256 255040b829345e4468fb34d2538dacf7f5abb7164ac8313103b0d6abe87e5e90

See more details on using hashes here.

File details

Details for the file pyabcranger-0.0.52-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyabcranger-0.0.52-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0e750b6fc60502ca6b267e5a8be28d1b5522b6ac6c56a1e7da8865a535fc74c1
MD5 ed141e2998d8dd0576e0fe653c207c87
BLAKE2b-256 7956f34ae2b518ea0f7bc064cd99b4bf1cd4f7c95798f2efae35219531ec9b16

See more details on using hashes here.

File details

Details for the file pyabcranger-0.0.52-cp36-cp36m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: pyabcranger-0.0.52-cp36-cp36m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: CPython 3.6m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for pyabcranger-0.0.52-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 cbdf01f0e4b628fd22a1a058300ad80e76ed8d3d900416a47a67184d012751fc
MD5 2f3a7962ba54e3950c42c0f48a4e70b2
BLAKE2b-256 f38c3284670a055393853bfa1535db5787f6562a6cddc8f7fae71cb131f5a01a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page