binlearn

A comprehensive binning and discretization library for machine learning

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

gykovacs

These details have not been verified by PyPI

Project links

Documentation

Project description

A modern, type-safe Python library for data binning and discretization with comprehensive error handling, sklearn compatibility, and DataFrame support.

🚀 Key Features

✨ Multiple Binning Methods

EqualWidthBinning - Equal-width intervals across data range
EqualFrequencyBinning - Equal-frequency (quantile-based) bins
KMeansBinning - K-means clustering-based discretization
GaussianMixtureBinning - Gaussian mixture model clustering-based binning
DBSCANBinning - Density-based clustering for natural groupings
EqualWidthMinimumWeightBinning - Weight-constrained equal-width binning
TreeBinning - Decision tree-based supervised binning for classification and regression
Chi2Binning - Chi-square statistic-based supervised binning for optimal class separation
IsotonicBinning - Isotonic regression-based supervised binning for monotonic relationships
ManualIntervalBinning - Custom interval boundary specification
ManualFlexibleBinning - Mixed interval and singleton bin definitions
SingletonBinning - Creates one bin per unique numeric value

🔧 Framework Integration

Pandas DataFrames - Native support with column name preservation
Polars DataFrames - High-performance columnar data support (optional)
NumPy Arrays - Efficient numerical array processing
Scikit-learn Pipelines - Full transformer compatibility

⚡ Modern Code Quality

Type Safety - 100% mypy compliance with comprehensive type annotations
Code Quality - 100% ruff compliance with modern Python syntax
Error Handling - Comprehensive validation with helpful error messages and suggestions
Test Coverage - 100% code coverage with 841 comprehensive tests
Documentation - Extensive examples and API documentation

📦 Installation

pip install binlearn

🔥 Quick Start

import numpy as np
import pandas as pd
from binlearn import EqualWidthBinning, TreeBinning, SingletonBinning, Chi2Binning

# Create sample data
data = pd.DataFrame({
    'age': np.random.normal(35, 10, 1000),
    'income': np.random.lognormal(10, 0.5, 1000),
    'score': np.random.uniform(0, 100, 1000)
})

# Equal-width binning with DataFrame preservation
binner = EqualWidthBinning(n_bins=5, preserve_dataframe=True)
data_binned = binner.fit_transform(data)

print(f"Original shape: {data.shape}")
print(f"Binned shape: {data_binned.shape}")
print(f"Bin edges for age: {binner.bin_edges_['age']}")

# SingletonBinning for numeric discrete values
numeric_discrete_data = pd.DataFrame({
    'category_id': [1, 2, 1, 3, 2, 1],
    'rating': [1, 2, 1, 3, 2, 1]
})

singleton_binner = SingletonBinning(preserve_dataframe=True)
numeric_binned = singleton_binner.fit_transform(numeric_discrete_data)
print(f"Numeric discrete binning: {numeric_binned.shape}")

🎯 Supervised Binning Example

from binlearn import TreeBinning
import numpy as np
from sklearn.datasets import make_classification

# Create classification dataset
X, y = make_classification(n_samples=1000, n_features=4, n_classes=2, random_state=42)

# Method 1: Using guidance_columns (binlearn style)
# Combine features and target into single dataset
X_with_target = np.column_stack([X, y])

sup_binner1 = TreeBinning(
    guidance_columns=[4],  # Use the target column to guide binning
    task_type='classification',
    tree_params={'max_depth': 3, 'min_samples_leaf': 20}
)
X_binned1 = sup_binner1.fit_transform(X_with_target)

# Method 2: Using X and y parameters (sklearn style)
# Pass features and target separately like sklearn
sup_binner2 = TreeBinning(
    task_type='classification',
    tree_params={'max_depth': 3, 'min_samples_leaf': 20}
)
sup_binner2.fit(X, y)  # y is automatically used as guidance
X_binned2 = sup_binner2.transform(X)

print(f"Method 1 - Input shape: {X_with_target.shape}, Output shape: {X_binned1.shape}")
print(f"Method 2 - Input shape: {X.shape}, Output shape: {X_binned2.shape}")
print(f"Both methods create same bins: {np.array_equal(X_binned1, X_binned2)}")

🛠️ Scikit-learn Integration

from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from binlearn import EqualFrequencyBinning

# Use the same classification dataset from previous example
X, y = make_classification(n_samples=1000, n_features=4, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create ML pipeline with binning preprocessing
pipeline = Pipeline([
    ('binning', EqualFrequencyBinning(n_bins=5)),
    ('classifier', RandomForestClassifier(random_state=42))
])

# Train and evaluate
pipeline.fit(X_train, y_train)
accuracy = pipeline.score(X_test, y_test)
print(f"Pipeline accuracy: {accuracy:.3f}")

📚 Available Methods

Interval-based Methods (Unsupervised):

EqualWidthBinning - Creates bins of equal width across the data range
EqualFrequencyBinning - Creates bins with approximately equal number of samples
KMeansBinning - Uses K-means clustering to determine bin boundaries
GaussianMixtureBinning - Uses Gaussian mixture models for probabilistic clustering
DBSCANBinning - Uses density-based clustering for natural groupings
EqualWidthMinimumWeightBinning - Equal-width bins with weight constraints
ManualIntervalBinning - Specify custom interval boundaries

Supervised Methods:

TreeBinning - Decision tree-based binning optimized for target variables (classification and regression)
Chi2Binning - Chi-square statistic-based binning for optimal feature-target association
IsotonicBinning - Isotonic regression-based binning for monotonic relationships

Flexible Methods:

ManualFlexibleBinning - Define mixed interval and singleton bins
SingletonBinning - Creates one bin per unique numeric value

⚙️ Requirements

Python Versions: 3.10, 3.11, 3.12, 3.13

Core Dependencies:

NumPy >= 1.21.0
SciPy >= 1.7.0
Scikit-learn >= 1.0.0
kmeans1d >= 0.3.0

Optional Dependencies:

Pandas >= 1.3.0 (for DataFrame support)
Polars >= 0.15.0 (for Polars DataFrame support)

Development Dependencies:

pytest >= 6.0 (for testing)
ruff >= 0.1.0 (for linting and formatting)
mypy >= 1.0.0 (for type checking)

🧪 Development Setup

# Clone repository
git clone https://github.com/TheDAALab/binlearn.git
cd binlearn

# Install in development mode with all dependencies
pip install -e ".[tests,dev,pandas,polars]"

# Run all tests
pytest

# Run code quality checks
ruff check binlearn/
mypy binlearn/ --ignore-missing-imports

# Build documentation
cd docs && make html

🏆 Code Quality Standards

✅ 100% Test Coverage - Comprehensive test suite with 841 tests
✅ 100% Type Safety - Complete mypy compliance with modern type annotations
✅ 100% Code Quality - Full ruff compliance with modern Python standards
✅ Comprehensive Documentation - Detailed API docs and examples
✅ Modern Python - Uses latest Python features and best practices
✅ Robust Error Handling - Helpful error messages with actionable suggestions

🤝 Contributing

We welcome contributions! Here’s how to get started:

Fork the repository on GitHub
Create a feature branch: git checkout -b feature/your-feature
Make your changes and add tests

Ensure all quality checks pass:

pytest                                    # Run tests
ruff check binlearn/                      # Check code quality
mypy binlearn/ --ignore-missing-imports   # Check types

Submit a pull request

Areas for Contribution:

🐛 Bug reports and fixes
✨ New binning algorithms
📚 Documentation improvements
🧪 Additional test cases
🎯 Performance optimizations

🔗 Links

GitHub Repository: https://github.com/TheDAALab/binlearn
Issue Tracker: https://github.com/TheDAALab/binlearn/issues
Documentation: https://binlearn.readthedocs.io/

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

Developed by TheDAALab

A modern, type-safe binning framework for Python data science workflows.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

gykovacs

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

This version

1.0.1

Aug 10, 2025

0.1.14

Aug 4, 2025

0.1.12

Aug 4, 2025

0.1.11

Aug 4, 2025

0.1.11.dev0 pre-release

Aug 4, 2025

0.1.10.dev0 pre-release

Aug 4, 2025

0.1.8.post0

Aug 4, 2025

0.1.7.dev0 pre-release

Aug 4, 2025

0.1.6.dev0 pre-release

Aug 4, 2025

0.1.5.dev0 pre-release

Aug 4, 2025

0.1.2.dev0 pre-release

Aug 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

binlearn-1.0.1.tar.gz (6.8 MB view details)

Uploaded Aug 10, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

binlearn-1.0.1-py3-none-any.whl (135.9 kB view details)

Uploaded Aug 10, 2025 Python 3

File details

Details for the file binlearn-1.0.1.tar.gz.

File metadata

Download URL: binlearn-1.0.1.tar.gz
Upload date: Aug 10, 2025
Size: 6.8 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for binlearn-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`e567b9246ab6d4d0bd034d68922ba7ef570ddd8de112edb5481c23462bf306a6`
MD5	`4b3b64d7ec1b7456b19602a97fe775a7`
BLAKE2b-256	`aa215f33d8efbc91f92d45df95980cc862593971d30174d65fdcd63c110fdacb`

See more details on using hashes here.

Provenance

The following attestation bundles were made for binlearn-1.0.1.tar.gz:

Publisher: release.yml on TheDAALab/binlearn

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: binlearn-1.0.1.tar.gz
- Subject digest: e567b9246ab6d4d0bd034d68922ba7ef570ddd8de112edb5481c23462bf306a6
- Sigstore transparency entry: 377196914
- Sigstore integration time: Aug 10, 2025
Source repository:
- Permalink: TheDAALab/binlearn@869d550d309370fae0b0c58065ed2d2ff1e14f2e
- Branch / Tag: refs/tags/v1.0.1
- Owner: https://github.com/TheDAALab
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@869d550d309370fae0b0c58065ed2d2ff1e14f2e
- Trigger Event: release

File details

Details for the file binlearn-1.0.1-py3-none-any.whl.

File metadata

Download URL: binlearn-1.0.1-py3-none-any.whl
Upload date: Aug 10, 2025
Size: 135.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for binlearn-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8ed94b73f2bb34a54f7a7932134d080367013ee1048a71d05edb15a850458a04`
MD5	`8773c9d41cf4167fc6e6e9bbefa96a2e`
BLAKE2b-256	`02e8381631160bbb763f015ff879deac08fae2f1620b1269bda383526fc0b318`

See more details on using hashes here.

Provenance

The following attestation bundles were made for binlearn-1.0.1-py3-none-any.whl:

Publisher: release.yml on TheDAALab/binlearn

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: binlearn-1.0.1-py3-none-any.whl
- Subject digest: 8ed94b73f2bb34a54f7a7932134d080367013ee1048a71d05edb15a850458a04
- Sigstore transparency entry: 377196930
- Sigstore integration time: Aug 10, 2025
Source repository:
- Permalink: TheDAALab/binlearn@869d550d309370fae0b0c58065ed2d2ff1e14f2e
- Branch / Tag: refs/tags/v1.0.1
- Owner: https://github.com/TheDAALab
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@869d550d309370fae0b0c58065ed2d2ff1e14f2e
- Trigger Event: release

binlearn 1.0.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🚀 Key Features

📦 Installation

🔥 Quick Start

🎯 Supervised Binning Example

🛠️ Scikit-learn Integration

📚 Available Methods

⚙️ Requirements

🧪 Development Setup

🏆 Code Quality Standards

🤝 Contributing

🔗 Links

📄 License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance