Skip to main content

A library for statistics and causal inference

Project description

STATINF

1. Installation

Downloads PyPI version

You can get statinf from PyPI with:

pip install statinf

statinf is a library for statistics and causal inference. It provides main the statistical models ranging from the traditional OLS to Neural Networks.

The library is supported on Windows, Linux and MacOs.

2. Documentation

You can find the full documentation at https://www.florianfelice.com/statinf.

You can also find an FAQ and the latest news of the library on the documentation.

3. Available modules

Here is a non-exhaustive list of available modules on statinf:

  1. MLP implements MultiLayer Perceptron (see MLP for more details and examples).

  2. OLS allows to use Ordinary Least Squares for linear regressions (see OLS for more details and examples).

  3. GLM implements the Generalized Linear Models see GLM for more details and examples).

  4. stats allows to use descriptive and tests statistics.

  5. data is a module to process data such as data generation, One Hot Encoding and others (see data processing or (see data generation modules for more details).

You can find the below examples and many more on https://www.florianfelice.com/statinf. Stay tuned with the future releases.

3.1. OLS

statinf comes with the OLS regression implemented with the analytical formula:

(X'X)^{-1}X'Y

from statinf.regressions import OLS
from statinf.data import generate_dataset

# Generate a synthetic dataset
data = generate_dataset(coeffs=[1.2556, -0.465, 1.665414, 2.5444, -7.56445], n=1000, std_dev=1.6)

# We set the OLS formula
formula = "Y ~ X0 + X1 + X2 + X3 + X4 + X1*X2 + exp(X2)"
# We fit the OLS with the data, the formula and without intercept
ols = OLS(formula, df, fit_intercept=True)

ols.summary()

The output will be:

==================================================================================
|                                  OLS summary                                   |
==================================================================================
|              =            0.98475 |  Adj.      =                   0.98464 |
| n              =                999 | p            =                         7 |
| Fisher value   =          10676.727 |                                          |
==================================================================================
| Variables         | Coefficients   | Std. Errors  | t-values   | Probabilities |
==================================================================================
| X0                |         1.3015 |      0.03079 |     42.273 |     0.0   *** |
| X1                |       -0.48712 |      0.03123 |    -15.597 |     0.0   *** |
| X2                |        1.62079 |      0.04223 |     38.377 |     0.0   *** |
| X3                |        2.55237 |       0.0326 |     78.284 |     0.0   *** |
| X4                |       -7.54776 |      0.03247 |   -232.435 |     0.0   *** |
| X1*X2             |        0.03626 |      0.02866 |      1.265 |   0.206       |
| exp(X2)           |       -0.00929 |      0.01551 |     -0.599 |   0.549       |
==================================================================================
| Significance codes: 0. < *** < 0.001 < ** < 0.01 < * < 0.05 < . < 0.1 < '' < 1 |

3.2. GLM

The logistic regression can be used for binary classification where Y follows a Bernoulli distribution. With X being the matrix of regressors, we have:

p=\mathbb{P}(Y=1)=\dfrac{1}{1+e^{-X\beta}}

We then implement the regression with:

from statinf.regressions import GLM
from statinf.data import generate_dataset

# Generate a synthetic dataset
data = generate_dataset(coeffs=[1.2556, -6.465, 1.665414, -1.5444], n=2500, std_dev=10.5, binary=True)

# We split data into train/test/application
train = data.iloc[0:1000]
test = data.iloc[1001:2000]


# We set the linear formula for Xb
formula = "Y ~ X0 + X1 + X2 + X3"
logit = GLM(formula, train, test_set=test)

# Fit the model
logit.fit(plot=False, maxit=10)

logit.get_weights()

The ouput will be:

==================================================================================
|                                  Logit summary                                 |
==================================================================================
| McFadden      =          0.67128 | McFadden  Adj.    =              0.6424 |
| Log-Likelihood  =          -227.62 | Null Log-Likelihood =             -692.45 |
| LR test p-value =              0.0 | Covariance          =           nonrobust |
| n               =              999 | p                   =                  5  |
| Iterations      =                8 | Convergence         =                True |
==================================================================================
| Variables         | Coefficients   | Std. Errors  | t-values   | Probabilities |
==================================================================================
| X0                |       -1.13024 |      0.10888 |    -10.381 |     0.0   *** |
| X1                |        0.02963 |      0.07992 |      0.371 |   0.711       |
| X2                |       -1.40968 |       0.1261 |    -11.179 |     0.0   *** |
| X3                |         0.5253 |      0.08966 |      5.859 |     0.0   *** |
==================================================================================
| Significance codes: 0. < *** < 0.001 < ** < 0.01 < * < 0.05 < . < 0.1 < '' < 1 |
==================================================================================

3.3. Multi Layer Perceptron

You can train a Neural Network using the MLP class. The below example shows how to train an MLP with 1 single linear layer. It is equivalent to implement an OLS with Gradient Descent.

from statinf.data import generate_dataset
from statinf.ml import MLP, Layer

# Generate the synthetic dataset
data = generate_dataset(coeffs=[1.2556, -6.465, 1.665414, 1.5444], n=1000, std_dev=1.6)

Y = ['Y']
X = [c for c in data.columns if c not in Y]

# Initialize the network and its architecture
nn = MLP()
nn.add(Layer(4, 1, activation='linear'))

# Train the neural network
nn.train(data=data, X=X, Y=Y, epochs=1, learning_rate=0.001)

# Extract the network's weights
print(nn.get_weights())

Output:

{'weights 0': array([[ 1.32005564],
       [-6.38121934],
       [ 1.64515704],
       [ 1.48571785]]), 'bias 0': array([0.81190412])}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

statinf-1.3.0.tar.gz (58.3 kB view details)

Uploaded Source

Built Distribution

statinf-1.3.0-py3-none-any.whl (66.9 kB view details)

Uploaded Python 3

File details

Details for the file statinf-1.3.0.tar.gz.

File metadata

  • Download URL: statinf-1.3.0.tar.gz
  • Upload date:
  • Size: 58.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for statinf-1.3.0.tar.gz
Algorithm Hash digest
SHA256 45b9c145ba8c857c125965457f4512c94d1cf601129da3da2c94153a4b3d7b36
MD5 cc0bd1a67cc768fd45b7a1f4ffd9beb7
BLAKE2b-256 a7dbdf00ef893263367fd94026acb4d4173874036c5b5399d1aa43d64b5035f9

See more details on using hashes here.

File details

Details for the file statinf-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: statinf-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 66.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for statinf-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 41b2f6f633bf9b16920fb118f8ed2e5605c8ac5730206ce5277126438a3b34a9
MD5 37d5e4ea9493372f2b74486af045f0b1
BLAKE2b-256 3bf561ee862c9898f0b20b1576c1a0a970eaf489ffc07b8429990e9bbe3bb36f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page