Skip to main content

Statistical modeling and machine learning framework

Project description

cleands

cleands is a Python package for statistical modeling and data science that unifies regression, classification, clustering, distribution modeling, and dimension reduction under a single interface.
It aims to provide a full open-source alternative to packages like Stata, SAS, SPSS, and MATLAB, while remaining extensible and Pythonic.

Features

  • Formula interface: Fit models directly from a Patsy-like formula, e.g. "y ~ x1 + x2 + x1:x2".
  • Supervised learning: Linear regression, logistic regression, Poisson regression, k-nearest neighbors, recursive partitioning trees, ensembles (bagging, random forests), shrinkage methods (lasso, ridge, elastic net), etc.
  • Classification: Logistic, multinomial, discriminant analysis, kNN classifiers, decision trees, random forests.
  • Unsupervised learning: k-means clustering and other clustering algorithms (planned).
  • Distributions: Parametric probability distributions with PDF, CDF, and likelihood-based inference.
  • Utilities: Cross-validation, bootstrap, gradient descent, Newton’s method, and more.

See the TODO list for planned additions such as QDA, SVM, DBSCAN, Gaussian mixtures, quantile regression, splines, and LDV models (Tobit, truncated).


Installation

Install the latest release from PyPI:

pip install cleands

Documentation

Full API documentation, usage guides, and examples are available here:

(Replace the links above with your actual GitHub Pages or Read the Docs URLs once deployed.)


Quick Start

Fit a linear regression model using formula notation:

import pandas as pd
from cleands.Prediction import LeastSquaresRegressor

# Example DataFrame
df = pd.DataFrame({
    "y": [1, 2, 3, 4, 5],
    "x1": [1, 2, 3, 4, 5],
    "x2": [2, 1, 2, 1, 2]
})

# Fit model with interaction term
model = LeastSquaresRegressor("y ~ x1 + x2 + x1:x2", data=df)

print(model.tidy)   # Coefficients with std errors, t-stats, and p-values
print(model.glance) # Model summary (R², AIC, BIC, etc.)

Logistic and Poisson regression use the same interface:

from cleands.Prediction import LogisticRegressor, PoissonRegressor

logit_model = LogisticRegressor("y ~ x1 + x2", data=df)
pois_model  = PoissonRegressor("y ~ x1 + x2", data=df)

k-means clustering (unsupervised):

from cleands.Clustering import kMeans

kmeans = kMeans("~x1+x2", data=df, k=2)
print(kmeans.groups)   # Cluster assignments
print(kmeans.means)    # Cluster centroids

Directory Structure

cleands/
│
├── base.py              # Abstract base classes (prediction, classification, clustering, distribution)
├── formula.py           # Formula parser for Patsy-like expressions
├── utils.py             # Utility functions (cross-validation, bootstrap, optimizers, etc.)
│
├── Prediction/          # Regression and supervised prediction models
│   ├── glm.py           # Least squares, logistic, Poisson regressors
│   ├── knn.py           # k-nearest neighbors regressors
│   ├── shrinkage.py     # Lasso, ridge, elastic net, etc.
│   ├── tree.py          # Recursive partitioning regressors
│   ├── ensemble.py      # Bagging, random forests
│
├── Classification/      # Classification models
│   ├── glm.py           # Logistic and multinomial classifiers
│   ├── knn.py           # kNN classifiers
│   ├── lda.py           # Linear discriminant analysis
│   ├── tree.py          # Recursive partitioning classifiers
│   ├── ensemble.py      # Bagging and random forest classifiers
│
├── Clustering/          # Unsupervised clustering
│   ├── kmeans.py        # k-means clustering
│
├── DimensionReduction/  # PCA, CCA, etc. (in progress)
├── Distribution/        # Probability distributions and tests

Roadmap

  • Stepwise model selection
  • Support for splines and GAMs
  • More clustering methods (DBSCAN, Gaussian mixtures, hierarchical)
  • Additional LDV models (Tobit, truncated regression)
  • Expanded distribution families
  • Neural networks and GLM trees

License

MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cleands-0.1.1.tar.gz (66.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cleands-0.1.1-py3-none-any.whl (76.7 kB view details)

Uploaded Python 3

File details

Details for the file cleands-0.1.1.tar.gz.

File metadata

  • Download URL: cleands-0.1.1.tar.gz
  • Upload date:
  • Size: 66.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for cleands-0.1.1.tar.gz
Algorithm Hash digest
SHA256 03aef453216b3e87233bfc1ac4d4cf387c1b85a3f75e8d72ff4a1b579a8f251b
MD5 a87895cf91ce8710715745c748c89660
BLAKE2b-256 16fe935fd00dabf1c9be180a5279e5393121958d5ebacbf13a928d9713a3139f

See more details on using hashes here.

File details

Details for the file cleands-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: cleands-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 76.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for cleands-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 71c5ca325f98b64cb72f6a1ce33cd1329a7824ee54ff2a4d88c62e7b649eecc3
MD5 9bfb48d24ead582ff2a7558182a2e74b
BLAKE2b-256 05f24bc4e2723a7d0386217aa1411644f634047a002815fca35d9824f53855ea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page