Skip to main content

Statistical modeling and machine learning framework

Project description

cleands

cleands is a Python package for statistical modeling and data science that unifies regression, classification, clustering, distribution modeling, and dimension reduction under a single interface.
It aims to provide a full open-source alternative to packages like Stata, SAS, SPSS, and MATLAB, while remaining extensible and Pythonic.

Features

  • Formula interface: Fit models directly from a Patsy-like formula, e.g. "y ~ x1 + x2 + x1:x2".
  • Supervised learning: Linear regression, logistic regression, Poisson regression, k-nearest neighbors, recursive partitioning trees, ensembles (bagging, random forests), shrinkage methods (lasso, ridge, elastic net), etc.
  • Classification: Logistic, multinomial, discriminant analysis, kNN classifiers, decision trees, random forests.
  • Unsupervised learning: k-means clustering and other clustering algorithms (planned).
  • Distributions: Parametric probability distributions with PDF, CDF, and likelihood-based inference.
  • Utilities: Cross-validation, bootstrap, gradient descent, Newton’s method, and more.

See the TODO list for planned additions such as QDA, SVM, DBSCAN, Gaussian mixtures, quantile regression, splines, and LDV models (Tobit, truncated).


Installation

Install the latest release from PyPI:

pip install cleands

Documentation

Full API documentation, usage guides, and examples are available here:

(Replace the links above with your actual GitHub Pages or Read the Docs URLs once deployed.)


Quick Start

Fit a linear regression model using formula notation:

import pandas as pd
from cleands.Prediction import LeastSquaresRegressor

# Example DataFrame
df = pd.DataFrame({
    "y": [1, 2, 3, 4, 5],
    "x1": [1, 2, 3, 4, 5],
    "x2": [2, 1, 2, 1, 2]
})

# Fit model with interaction term
model = LeastSquaresRegressor("y ~ x1 + x2 + x1:x2", data=df)

print(model.tidy)   # Coefficients with std errors, t-stats, and p-values
print(model.glance) # Model summary (R², AIC, BIC, etc.)

Logistic and Poisson regression use the same interface:

from cleands.Prediction import LogisticRegressor, PoissonRegressor

logit_model = LogisticRegressor("y ~ x1 + x2", data=df)
pois_model  = PoissonRegressor("y ~ x1 + x2", data=df)

k-means clustering (unsupervised):

from cleands.Clustering import kMeans

kmeans = kMeans("~x1+x2", data=df, k=2)
print(kmeans.groups)   # Cluster assignments
print(kmeans.means)    # Cluster centroids

Directory Structure

cleands/
│
├── base.py              # Abstract base classes (prediction, classification, clustering, distribution)
├── formula.py           # Formula parser for Patsy-like expressions
├── utils.py             # Utility functions (cross-validation, bootstrap, optimizers, etc.)
│
├── Prediction/          # Regression and supervised prediction models
│   ├── glm.py           # Least squares, logistic, Poisson regressors
│   ├── knn.py           # k-nearest neighbors regressors
│   ├── shrinkage.py     # Lasso, ridge, elastic net, etc.
│   ├── tree.py          # Recursive partitioning regressors
│   ├── ensemble.py      # Bagging, random forests
│
├── Classification/      # Classification models
│   ├── glm.py           # Logistic and multinomial classifiers
│   ├── knn.py           # kNN classifiers
│   ├── lda.py           # Linear discriminant analysis
│   ├── tree.py          # Recursive partitioning classifiers
│   ├── ensemble.py      # Bagging and random forest classifiers
│
├── Clustering/          # Unsupervised clustering
│   ├── kmeans.py        # k-means clustering
│
├── DimensionReduction/  # PCA, CCA, etc. (in progress)
├── Distribution/        # Probability distributions and tests

Roadmap

  • Stepwise model selection
  • Support for splines and GAMs
  • More clustering methods (DBSCAN, Gaussian mixtures, hierarchical)
  • Additional LDV models (Tobit, truncated regression)
  • Expanded distribution families
  • Neural networks and GLM trees

License

MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cleands-0.2.2.tar.gz (66.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cleands-0.2.2-py3-none-any.whl (76.7 kB view details)

Uploaded Python 3

File details

Details for the file cleands-0.2.2.tar.gz.

File metadata

  • Download URL: cleands-0.2.2.tar.gz
  • Upload date:
  • Size: 66.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for cleands-0.2.2.tar.gz
Algorithm Hash digest
SHA256 5f3f45ab64178ff2245122b96d8902b2f201c1462a084337f24186fa886509dc
MD5 e86277a25ae0fb21b1781227652f13a9
BLAKE2b-256 4f146648e109e25e60235c340289a26219e0bee39cf276cdb9deacddfe7404ac

See more details on using hashes here.

File details

Details for the file cleands-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: cleands-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 76.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for cleands-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2a49040bd5be040496af92327cfb1b881cfe6532c0bc7e68257b09b1739f6b4d
MD5 07d187c70c16efab4d55ca1560b5716f
BLAKE2b-256 98785b8741abdbb17be1794047f1a150bfcbce67ee7ed226ceba403ac145ed4a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page