Skip to main content

A Python 3 Library for State-of-the-Art Statistical Dimension Reduction Methods

Project description

direpack: a Python 3 library for state-of-the-art statistical dimension reduction techniques

This package delivers a scikit-learn compatible Python 3 package for sundry state-of-the art multivariate statistical methods, with a focus on dimension reduction.

The categories of methods delivered in this package, are:

  • Projection pursuit dimension reduction (ppdire)
  • Sufficient dimension reduction (sudire)
  • Robust M-estimators for dimension reduction (sprm) each of which are presented as scikit-learn compatible objects in the corresponding folders.

We hope that this package leads to scientific success. If it does so, we kindly ask to cite the direpack vignette [0], as well as the original publication of the corresponding method.

The package also contains a set of tools for pre- and postprocessing:

  • The preprocessing folder provides classical and robust centring and scaling, as well as spatial sign transforms [4]
  • The dicomo folder contains a versatile class to access a wide variety of moment and co-moment statistics, and statistics derived from those. Check out the dicomo Documentation file and the dicomo Examples Notebook.
  • Plotting utilities in the plot folder
  • Cross-validation utilities in the cross-validation folder

AIG sprm score space

Methods in the sprm folder

  • The estimator (sprm.py) [1]
  • The Sparse NIPALS (SNIPLS) estimator [3](snipls.py)
  • Robust M regression estimator (rm.py)
  • Ancillary functions for M-estimation (_m_support_functions.py)

Methods in the ppdire folder

The ppdire class will give access to a wide range of projection pursuit dimension reduction techniques. These include slower approximate estimates for well-established methods such as PCA, PLS and continuum regression. However, the class provides unique access to a set of robust options, such as robust continuum regression (RCR) [5], through its native grid optimization algorithm, first published for RCR as well [6]. Moreover, ppdire is also a great gateway to calculate generalized betas, using the CAPI projection index [7].

The code is orghanized in

  • ppdire.py - the main PP dimension reduction class
  • capi.py - the co-moment analysis projection index.

Methods in the sudire folder

The sudire folder gives access to an extensive set of methods that resort under the umbrella of sufficient dimension reduction. These range from meanwhile long-standing, well-accepted approaches, such as sliced inverse regression (SIR) and the closely related SAVE [8,9], through methods such as directional regression [10] and principal Hessian directions [11], and more. However, the package also contains some of the most recently developed, state-of-the-art sufficient dimension reduction techniques, that require no distributional assumptions. The options provided in this category are based on energy statistics (distance covariance [12] or martingale difference divergence [13]) and ball statistics (ball covariance) [14]. All of these options can be called by setting the corresponding parameters in the sudire class, cf. the docs. Note: the ball covariance option will require some lines to be uncommented as indicated. We decided not to make that option generally available, since it depends on the Ball package that seems to be difficult to install on certain architectures.

How to install

The package is distributed through PyPI, so install through:

    pip install direpack

Documentation

  • Detailed documentation can be found in the ReadTheDocs page.
  • A more extensive description on the background is presented in the direpack vignette.
  • Examples on how to use each of the dicomo, ppdire, sprm and sudire classes are presented as Jupyter notebooks in the examples folder
  • Furthemore, the docs folder contains a few markdown files on usage of the classes.

References

  1. direpack: A Python 3 package for state-of-the-art statistical dimension reduction methods
  2. Sparse partial robust M regression, Irene Hoffmann, Sven Serneels, Peter Filzmoser, Christophe Croux, Chemometrics and Intelligent Laboratory Systems, 149 (2015), 50-59.
  3. Partial robust M regression, Sven Serneels, Christophe Croux, Peter Filzmoser, Pierre J. Van Espen, Chemometrics and Intelligent Laboratory Systems, 79 (2005), 55-64.
  4. Sparse and robust PLS for binary classification, I. Hoffmann, P. Filzmoser, S. Serneels, K. Varmuza, Journal of Chemometrics, 30 (2016), 153-162.
  5. Spatial Sign Preprocessing:  A Simple Way To Impart Moderate Robustness to Multivariate Estimators, Sven Serneels, Evert De Nolf, Pierre J. Van Espen, Journal of Chemical Information and Modeling, 46 (2006), 1402-1409.
  6. Robust Continuum Regression, Sven Serneels, Peter Filzmoser, Christophe Croux, Pierre J. Van Espen, Chemometrics and Intelligent Laboratory Systems, 76 (2005), 197-204.
  7. Robust Multivariate Methods: The Projection Pursuit Approach, Peter Filzmoser, Sven Serneels, Christophe Croux and Pierre J. Van Espen, in: From Data and Information Analysis to Knowledge Engineering, Spiliopoulou, M., Kruse, R., Borgelt, C., Nuernberger, A. and Gaul, W., eds., Springer Verlag, Berlin, Germany, 2006, pages 270--277.
  8. Projection pursuit based generalized betas accounting for higher order co-moment effects in financial market analysis, Sven Serneels, in: JSM Proceedings, Business and Economic Statistics Section. Alexandria, VA: American Statistical Association, 2019, 3009-3035.
  9. Sliced Inverse Regression for Dimension Reduction Li K-C, Journal of the American Statistical Association (1991), 86, 316-327.
  10. Sliced Inverse Regression for Dimension Reduction: Comment, R.D. Cook, and Sanford Weisberg, Journal of the American Statistical Association (1991), 86, 328-332.
  11. On directional regression for dimension reduction , B. Li and S.Wang, Journal of the American Statistical Association (2007), 102:997–1008.
  12. On principal hessian directions for data visualization and dimension reduction:Another application of stein’s lemma, K.-C. Li. , Journal of the American Statistical Association(1992)., 87,1025–1039.
  13. Sufficient Dimension Reduction via Distance Covariance, Wenhui Sheng and Xiangrong Yin in: Journal of Computational and Graphical Statistics (2016), 25, issue 1, pages 91-104.
  14. A martingale-difference-divergence-based estimation of central mean subspace, Yu Zhang, Jicai Liu, Yuesong Wu and Xiangzhong Fang, in: Statistics and Its Interface (2019), 12, number 3, pages 489-501.
  15. Robust Sufficient Dimension Reduction Via Ball Covariance Jia Zhang and Xin Chen, Computational Statistics and Data Analysis 140 (2019) 144–154

Release Notes can be checked out in the repository.

A list of possible topics for further development is provided as well. Additions and comments are welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

direpack-1.0.11.tar.gz (63.0 kB view details)

Uploaded Source

Built Distribution

direpack-1.0.11-py3-none-any.whl (78.1 kB view details)

Uploaded Python 3

File details

Details for the file direpack-1.0.11.tar.gz.

File metadata

  • Download URL: direpack-1.0.11.tar.gz
  • Upload date:
  • Size: 63.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.2

File hashes

Hashes for direpack-1.0.11.tar.gz
Algorithm Hash digest
SHA256 e245031f1184c1dd2e04fa6d4a886e13d01ec361b76f705307cd547255b595dc
MD5 1b051376b15f59c94882dbda1d7fd5ac
BLAKE2b-256 d94c8394a4a91a9bb717a23b996aac55f765a192ff9a4f18c95ece5907344049

See more details on using hashes here.

Provenance

File details

Details for the file direpack-1.0.11-py3-none-any.whl.

File metadata

  • Download URL: direpack-1.0.11-py3-none-any.whl
  • Upload date:
  • Size: 78.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.2

File hashes

Hashes for direpack-1.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 235965b57cbe688d1966585b9d7402219396b9f753f081492d66008d0a337565
MD5 45c928abaa3e5feee9ea009a972f3876
BLAKE2b-256 b66cb6eeb402bca1a4cb473cd4cc4683e4a61aef4cfce0f9489313491e2fd88a

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page