Skip to main content

A binary classifier using Student's t-distribution for univariate and multivariate continuous data.

Project description

TDistributionClassifier

Author: Abdul Mofique Siddiqui
License: MIT
Install via pip:

pip install TDistributionClassifier

Import it in your Python code:

from TDistributionClassifier import TDistributionClassifier

Overview

TDistributionClassifier is a machine learning classifier designed to model classes using Student's t-distribution. This classifier is robust to outliers and performs effectively when data contains noise or extreme values. It is a versatile tool capable of handling both binary and multiclass classification tasks for both univariate (1D) and multivariate (multi-dimensional) data.


Installation

Install the package via pip:

pip install tdistributionclassifier

How It Works

  • Univariate Mode: For 1D features, each class is modeled using a univariate t-distribution.
  • Multivariate Mode: For multi-dimensional features, each class is modeled using a multivariate t-distribution.
  • Uses log-probabilities and the log-sum-exp trick for numerical stability.
  • Automatically detects the input dimensionality and selects the appropriate mode.

Getting Started

1. Import the package

from TDistributionClassifier import TDistributionClassifier

2. Initialize the classifier

clf = TDistributionClassifier()

3. Fit the model

clf.fit(X_train, y_train)
  • X_train: numpy array of shape (n_samples,) or (n_samples, n_features)
  • y_train: class labels

4. Predict class probabilities

probs = clf.predict_proba(X_test)
  • Returns a numpy array of shape (n_samples, n_classes) with class probabilities.

5. Predict class labels

labels = clf.predict(X_test)
  • Returns predicted class labels

API Reference

TDistributionClassifier()

Initializes the classifier. No arguments required.


.fit(X, y)

Fits the model to the training data.

  • Parameters:
    • X: numpy array of training features. Shape: (n_samples,) or (n_samples, n_features)
    • y: class labels. Shape: (n_samples,)

.predict_proba(X)

Returns predicted class probabilities.

  • Input:
    • X: Features. Shape: (n_samples,) or (n_samples, n_features)
  • Output:
    • probs: array of shape (n_samples, n_classes) with class probabilities

.predict(X)

Returns predicted class labels based on highest probability.

  • Input:
    • X: Features. Shape: (n_samples,) or (n_samples, n_features)
  • Output:
    • labels: array of shape (n_samples,) with class labels

Example Usage

Example 1: Binary Classification

from TDistributionClassifier import TDistributionClassifier
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split

# Load data and binarize target
data = load_diabetes()
X = data.data
y = (data.target > 100).astype(int)  # Binary classification

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

# Initialize and train
clf = TDistributionClassifier()
clf.fit(X_train, y_train)

# Predict
probs = clf.predict_proba(X_test)
preds = clf.predict(X_test)

Example 2: Multiclass Classification

from TDistributionClassifier import TDistributionClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load the Iris dataset (3 classes)
data = load_iris()
X = data.data
y = data.target  # Multiclass labels (3 classes)

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

# Initialize and train
clf = TDistributionClassifier()
clf.fit(X_train, y_train)

# Predict
probs = clf.predict_proba(X_test)
labels = clf.predict(X_test)

Internals

  • PDF Estimation: Uses scipy.stats.t (univariate) or scipy.stats.multivariate_t (multivariate).
  • Regularization: Adds small noise (1e-6 * I) to covariance matrices to ensure invertibility.
  • Numerical Stability: Log-probabilities with log-sum-exp used for probability normalization.

Notes

  • Supports both binary and multiclass classification.
  • Multivariate mode is triggered when input has >1 features.
  • The Student's t-distribution's heavy tails make it more resilient to extreme values compared to traditional Gaussian-based models.
  • If data is not linearly separable, consider applying feature transformation or dimensionality reduction before use.

Author

Abdul Mofique Siddiqui


License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tdistributionclassifier-1.2.0.tar.gz (4.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

TDistributionClassifier-1.2.0-py3-none-any.whl (5.3 kB view details)

Uploaded Python 3

File details

Details for the file tdistributionclassifier-1.2.0.tar.gz.

File metadata

  • Download URL: tdistributionclassifier-1.2.0.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for tdistributionclassifier-1.2.0.tar.gz
Algorithm Hash digest
SHA256 5bd01c6e4ab3934b4b7825ed071a147c4b07765bc60004df3880e014a22cad17
MD5 43e61087565368ba4a69f7fca684d23d
BLAKE2b-256 5bb21c6ebcfada4e83714dcf9fd021c186616ab86afcb11b91f4d867a5d7cd17

See more details on using hashes here.

File details

Details for the file TDistributionClassifier-1.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for TDistributionClassifier-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c8b844791efba73848f588f476a21d260e91eb21f2646c84b53b08062300f28b
MD5 135cedd2b3db29bd1c3cbd751ff9d288
BLAKE2b-256 4b3c6f147a5f993bd30f05d67eede9e7d2045bd035626e6762dc4493c322b0eb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page