Skip to main content

Statistical modeling and machine learning framework

Project description

cleands

cleands is a Python package for statistical modeling and data science that unifies regression, classification, clustering, distribution modeling, and dimension reduction under a single interface.
It aims to provide a full open-source alternative to packages like Stata, SAS, SPSS, and MATLAB, while remaining extensible and Pythonic.

Features

  • Formula interface: Fit models directly from a Patsy-like formula, e.g. "y ~ x1 + x2 + x1:x2".
  • Supervised learning: Linear regression, logistic regression, Poisson regression, k-nearest neighbors, recursive partitioning trees, ensembles (bagging, random forests), shrinkage methods (lasso, ridge, elastic net), etc.
  • Classification: Logistic, multinomial, discriminant analysis, kNN classifiers, decision trees, random forests.
  • Unsupervised learning: k-means clustering and other clustering algorithms (planned).
  • Distributions: Parametric probability distributions with PDF, CDF, and likelihood-based inference.
  • Utilities: Cross-validation, bootstrap, gradient descent, Newton’s method, and more.

See the TODO list for planned additions such as QDA, SVM, DBSCAN, Gaussian mixtures, quantile regression, splines, and LDV models (Tobit, truncated).


Installation

Install the latest release from PyPI:

pip install cleands

Documentation

Full API documentation, usage guides, and examples are available here:

(Replace the links above with your actual GitHub Pages or Read the Docs URLs once deployed.)


Quick Start

Fit a linear regression model using formula notation:

import pandas as pd
from cleands.Prediction import LeastSquaresRegressor

# Example DataFrame
df = pd.DataFrame({
    "y": [1, 2, 3, 4, 5],
    "x1": [1, 2, 3, 4, 5],
    "x2": [2, 1, 2, 1, 2]
})

# Fit model with interaction term
model = LeastSquaresRegressor("y ~ x1 + x2 + x1:x2", data=df)

print(model.tidy)   # Coefficients with std errors, t-stats, and p-values
print(model.glance) # Model summary (R², AIC, BIC, etc.)

Logistic and Poisson regression use the same interface:

from cleands.Prediction import LogisticRegressor, PoissonRegressor

logit_model = LogisticRegressor("y ~ x1 + x2", data=df)
pois_model  = PoissonRegressor("y ~ x1 + x2", data=df)

k-means clustering (unsupervised):

from cleands.Clustering import kMeans

kmeans = kMeans("~x1+x2", data=df, k=2)
print(kmeans.groups)   # Cluster assignments
print(kmeans.means)    # Cluster centroids

Directory Structure

cleands/
│
├── base.py              # Abstract base classes (prediction, classification, clustering, distribution)
├── formula.py           # Formula parser for Patsy-like expressions
├── utils.py             # Utility functions (cross-validation, bootstrap, optimizers, etc.)
│
├── Prediction/          # Regression and supervised prediction models
│   ├── glm.py           # Least squares, logistic, Poisson regressors
│   ├── knn.py           # k-nearest neighbors regressors
│   ├── shrinkage.py     # Lasso, ridge, elastic net, etc.
│   ├── tree.py          # Recursive partitioning regressors
│   ├── ensemble.py      # Bagging, random forests
│
├── Classification/      # Classification models
│   ├── glm.py           # Logistic and multinomial classifiers
│   ├── knn.py           # kNN classifiers
│   ├── lda.py           # Linear discriminant analysis
│   ├── tree.py          # Recursive partitioning classifiers
│   ├── ensemble.py      # Bagging and random forest classifiers
│
├── Clustering/          # Unsupervised clustering
│   ├── kmeans.py        # k-means clustering
│
├── DimensionReduction/  # PCA, CCA, etc. (in progress)
├── Distribution/        # Probability distributions and tests

Roadmap

  • Stepwise model selection
  • Support for splines and GAMs
  • More clustering methods (DBSCAN, Gaussian mixtures, hierarchical)
  • Additional LDV models (Tobit, truncated regression)
  • Expanded distribution families
  • Neural networks and GLM trees

License

MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cleands-0.2.1.tar.gz (66.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cleands-0.2.1-py3-none-any.whl (76.7 kB view details)

Uploaded Python 3

File details

Details for the file cleands-0.2.1.tar.gz.

File metadata

  • Download URL: cleands-0.2.1.tar.gz
  • Upload date:
  • Size: 66.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for cleands-0.2.1.tar.gz
Algorithm Hash digest
SHA256 dfd6ead8c8b1f80798eef24c5ad80e51fe73adff4d19744e87f95931f603d3bf
MD5 4f4a5adf03dda0f101d4af57960bc50c
BLAKE2b-256 4855af2879a347291653a7930877f80a10504adbbe539e1d42e2038e99af9b65

See more details on using hashes here.

File details

Details for the file cleands-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: cleands-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 76.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for cleands-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9d2134946996fc46f622e10485d372becb4383cb67df998f429429346a52b26f
MD5 824066b9ffeffa11af1728eccc8a16b7
BLAKE2b-256 b7978559f20275d5f72b7efb4794467a013fae6a511269582807ad08fdbe0b7d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page