A binary classifier using Student's t-distribution for univariate and multivariate continuous data.
Project description
TDistributionClassifier
Author: Abdul Mofique Siddiqui
License: MIT
Install via pip:
pip install TDistributionClassifier
Import it in your Python code:
from TDistributionClassifier import TDistributionClassifier
Overview
TDistributionClassifier is a machine learning classifier designed to model classes using Student's t-distribution. This classifier is robust to outliers and performs effectively when data contains noise or extreme values. It is a versatile tool capable of handling both binary and multiclass classification tasks for both univariate (1D) and multivariate (multi-dimensional) data.
Installation
Install the package via pip:
pip install tdistributionclassifier
How It Works
- Univariate Mode: For 1D features, each class is modeled using a univariate t-distribution.
- Multivariate Mode: For multi-dimensional features, each class is modeled using a multivariate t-distribution.
- Uses log-probabilities and the log-sum-exp trick for numerical stability.
- Automatically detects the input dimensionality and selects the appropriate mode.
Getting Started
1. Import the package
from TDistributionClassifier import TDistributionClassifier
2. Initialize the classifier
clf = TDistributionClassifier()
3. Fit the model
clf.fit(X_train, y_train)
X_train: numpy array of shape(n_samples,)or(n_samples, n_features)y_train: class labels
4. Predict class probabilities
probs = clf.predict_proba(X_test)
- Returns a numpy array of shape
(n_samples, n_classes)with class probabilities.
5. Predict class labels
labels = clf.predict(X_test)
- Returns predicted class labels
API Reference
TDistributionClassifier()
Initializes the classifier. No arguments required.
.fit(X, y)
Fits the model to the training data.
- Parameters:
X: numpy array of training features. Shape:(n_samples,)or(n_samples, n_features)y: class labels. Shape:(n_samples,)
.predict_proba(X)
Returns predicted class probabilities.
- Input:
X: Features. Shape:(n_samples,)or(n_samples, n_features)
- Output:
probs: array of shape(n_samples, n_classes)with class probabilities
.predict(X)
Returns predicted class labels based on highest probability.
- Input:
X: Features. Shape:(n_samples,)or(n_samples, n_features)
- Output:
labels: array of shape(n_samples,)with class labels
Example Usage
Example 1: Binary Classification
from TDistributionClassifier import TDistributionClassifier
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
# Load data and binarize target
data = load_diabetes()
X = data.data
y = (data.target > 100).astype(int) # Binary classification
# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
# Initialize and train
clf = TDistributionClassifier()
clf.fit(X_train, y_train)
# Predict
probs = clf.predict_proba(X_test)
preds = clf.predict(X_test)
Example 2: Multiclass Classification
from TDistributionClassifier import TDistributionClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
# Load the Iris dataset (3 classes)
data = load_iris()
X = data.data
y = data.target # Multiclass labels (3 classes)
# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
# Initialize and train
clf = TDistributionClassifier()
clf.fit(X_train, y_train)
# Predict
probs = clf.predict_proba(X_test)
labels = clf.predict(X_test)
Internals
- PDF Estimation: Uses
scipy.stats.t(univariate) orscipy.stats.multivariate_t(multivariate). - Regularization: Adds small noise (
1e-6 * I) to covariance matrices to ensure invertibility. - Numerical Stability: Log-probabilities with log-sum-exp used for probability normalization.
Notes
- Supports both binary and multiclass classification.
- Multivariate mode is triggered when input has
>1features. - The Student's t-distribution's heavy tails make it more resilient to extreme values compared to traditional Gaussian-based models.
- If data is not linearly separable, consider applying feature transformation or dimensionality reduction before use.
Author
Abdul Mofique Siddiqui
License
This project is licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tdistributionclassifier-1.2.0.tar.gz.
File metadata
- Download URL: tdistributionclassifier-1.2.0.tar.gz
- Upload date:
- Size: 4.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5bd01c6e4ab3934b4b7825ed071a147c4b07765bc60004df3880e014a22cad17
|
|
| MD5 |
43e61087565368ba4a69f7fca684d23d
|
|
| BLAKE2b-256 |
5bb21c6ebcfada4e83714dcf9fd021c186616ab86afcb11b91f4d867a5d7cd17
|
File details
Details for the file TDistributionClassifier-1.2.0-py3-none-any.whl.
File metadata
- Download URL: TDistributionClassifier-1.2.0-py3-none-any.whl
- Upload date:
- Size: 5.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c8b844791efba73848f588f476a21d260e91eb21f2646c84b53b08062300f28b
|
|
| MD5 |
135cedd2b3db29bd1c3cbd751ff9d288
|
|
| BLAKE2b-256 |
4b3c6f147a5f993bd30f05d67eede9e7d2045bd035626e6762dc4493c322b0eb
|