Statistical modeling and machine learning framework
Project description
cleands
cleands is a Python package for statistical modeling and data science that unifies regression, classification, clustering, distribution modeling, and dimension reduction under a single interface.
It aims to provide a full open-source alternative to packages like Stata, SAS, SPSS, and MATLAB, while remaining extensible and Pythonic.
Features
- Formula interface: Fit models directly from a Patsy-like formula, e.g.
"y ~ x1 + x2 + x1:x2". - Supervised learning: Linear regression, logistic regression, Poisson regression, k-nearest neighbors, recursive partitioning trees, ensembles (bagging, random forests), shrinkage methods (lasso, ridge, elastic net), etc.
- Classification: Logistic, multinomial, discriminant analysis, kNN classifiers, decision trees, random forests.
- Unsupervised learning: k-means clustering and other clustering algorithms (planned).
- Distributions: Parametric probability distributions with PDF, CDF, and likelihood-based inference.
- Utilities: Cross-validation, bootstrap, gradient descent, Newton’s method, and more.
See the TODO list for planned additions such as QDA, SVM, DBSCAN, Gaussian mixtures, quantile regression, splines, and LDV models (Tobit, truncated).
Installation
Install the latest release from PyPI:
pip install cleands
Documentation
Full API documentation, usage guides, and examples are available here:
(Replace the links above with your actual GitHub Pages or Read the Docs URLs once deployed.)
Quick Start
Fit a linear regression model using formula notation:
import pandas as pd
from cleands.Prediction import LeastSquaresRegressor
# Example DataFrame
df = pd.DataFrame({
"y": [1, 2, 3, 4, 5],
"x1": [1, 2, 3, 4, 5],
"x2": [2, 1, 2, 1, 2]
})
# Fit model with interaction term
model = LeastSquaresRegressor("y ~ x1 + x2 + x1:x2", data=df)
print(model.tidy) # Coefficients with std errors, t-stats, and p-values
print(model.glance) # Model summary (R², AIC, BIC, etc.)
Logistic and Poisson regression use the same interface:
from cleands.Prediction import LogisticRegressor, PoissonRegressor
logit_model = LogisticRegressor("y ~ x1 + x2", data=df)
pois_model = PoissonRegressor("y ~ x1 + x2", data=df)
k-means clustering (unsupervised):
from cleands.Clustering import kMeans
kmeans = kMeans("~x1+x2", data=df, k=2)
print(kmeans.groups) # Cluster assignments
print(kmeans.means) # Cluster centroids
Directory Structure
cleands/
│
├── base.py # Abstract base classes (prediction, classification, clustering, distribution)
├── formula.py # Formula parser for Patsy-like expressions
├── utils.py # Utility functions (cross-validation, bootstrap, optimizers, etc.)
│
├── Prediction/ # Regression and supervised prediction models
│ ├── glm.py # Least squares, logistic, Poisson regressors
│ ├── knn.py # k-nearest neighbors regressors
│ ├── shrinkage.py # Lasso, ridge, elastic net, etc.
│ ├── tree.py # Recursive partitioning regressors
│ ├── ensemble.py # Bagging, random forests
│
├── Classification/ # Classification models
│ ├── glm.py # Logistic and multinomial classifiers
│ ├── knn.py # kNN classifiers
│ ├── lda.py # Linear discriminant analysis
│ ├── tree.py # Recursive partitioning classifiers
│ ├── ensemble.py # Bagging and random forest classifiers
│
├── Clustering/ # Unsupervised clustering
│ ├── kmeans.py # k-means clustering
│
├── DimensionReduction/ # PCA, CCA, etc. (in progress)
├── Distribution/ # Probability distributions and tests
Roadmap
- Stepwise model selection
- Support for splines and GAMs
- More clustering methods (DBSCAN, Gaussian mixtures, hierarchical)
- Additional LDV models (Tobit, truncated regression)
- Expanded distribution families
- Neural networks and GLM trees
License
MIT License. See LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cleands-0.1.1.tar.gz.
File metadata
- Download URL: cleands-0.1.1.tar.gz
- Upload date:
- Size: 66.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
03aef453216b3e87233bfc1ac4d4cf387c1b85a3f75e8d72ff4a1b579a8f251b
|
|
| MD5 |
a87895cf91ce8710715745c748c89660
|
|
| BLAKE2b-256 |
16fe935fd00dabf1c9be180a5279e5393121958d5ebacbf13a928d9713a3139f
|
File details
Details for the file cleands-0.1.1-py3-none-any.whl.
File metadata
- Download URL: cleands-0.1.1-py3-none-any.whl
- Upload date:
- Size: 76.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
71c5ca325f98b64cb72f6a1ce33cd1329a7824ee54ff2a4d88c62e7b649eecc3
|
|
| MD5 |
9bfb48d24ead582ff2a7558182a2e74b
|
|
| BLAKE2b-256 |
05f24bc4e2723a7d0386217aa1411644f634047a002815fca35d9824f53855ea
|