Python micro-package for enhanced statistical analysis

# enhancesa

Enhancesa is a collection of tools for a better and more simplified statistical analysis in Python. It primarily aids in manual analysis and prediction tasks that use packages like Statsmodels and Scikit-learn in their workflow.

For example, Enhancesa provides answers to questions like: Which subset of features gives me the lowest error rate in an ordinary least squares model? What are estimates of population mean and standard deviation using bootstrap resampling? And etc.

#### Upcoming features

• Partial least squares (PLS) regression
• Principal components regression (PCR)
• Subset selection plots
• Additional test statistics in bootstrap resampling

### Motivation

Enhancesa is a result of solutions to exercises in the book Introduction to Statistical Learning by the Tibshirani et al. When going through the exercises, I found Python, unlike R, lacking in providing convenient functionalities. At this stage, this package is simply a collection of functions I used in my solutions to exercises in the book.

### Installation

Enhancesa can be installed from the PyPI package repository.

``````\$ pip install enhancesa
``````

### Quick glimpse

```>>> import numpy as np
>>> import enhancesa as esa
>>> # Create some dummy data
>>> x = np.random.normal(size=100)
>>> # Compute test statistics with bootstrap resampling
>>> esa.bootstrap(x, iters=1000)
Estimated mean: -0.025309
Estimated SE: 0.095531
dtype: float64
```

Find out more about the full set of features in the documentation.

### Issues & improvements

• Possible to further reduce dependencies.
• `boostrap` method can be improved by adding estimates of more test statistics of interest.
• Use Poetry for package and dependency management, which uses `pyproject.toml` recommended by PEP 518.
• `enhancesa.SubsetSelect` will give `NotImplemented` error if `X` input is a Numpy array.