A python package for preprocessing tabular data

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

📦 pretab

pretab is a modular, extensible, and scikit-learn-compatible preprocessing library for tabular data. It supports all sklearn transformers out of the box, and extends functionality with a rich set of custom encoders, splines, and neural basis expansions.

✨ Features

🔢 Numerical preprocessing via:
- Polynomial and spline expansions: B-splines, natural cubic splines, thin plate splines, tensor product splines, P-splines
- Neural-inspired basis: RBF, ReLU, Sigmoid, Tanh
- Custom binning: rule-based or tree-based
- Piecewise Linear Encoding (PLE)
🌤 Categorical preprocessing:
- Ordinal encodings
- One-hot encodings
- Language embeddings (pretrained vectorizers)
- Custom encoders like OneHotFromOrdinalTransformer
🔧 Composable pipeline interface:
- Fully compatible with sklearn.pipeline.Pipeline and sklearn.compose.ColumnTransformer
- Accepts all sklearn-native transformers and parameters seamlessly
🧠 Smart preprocessing:
- Automatically detects feature types (categorical vs numerical)
- Supports both pandas.DataFrame and numpy.ndarray inputs
🧪 Comprehensive test coverage
🤝 Community-driven and open to contributions

💠 Installation

Install via pip:

pip install pretab

Or install in editable mode for development:

git clone https://github.com/OpenTabular/pretab.git
cd pretab
pip install -e .

🚀 Quickstart

import pandas as pd
import numpy as np
from pretab.preprocessor import Preprocessor

# Simulated tabular dataset
df = pd.DataFrame({
    "age": np.random.randint(18, 65, size=100),
    "income": np.random.normal(60000, 15000, size=100).astype(int),
    "job": np.random.choice(["nurse", "engineer", "scientist", "teacher", "artist", "manager"], size=100),
    "city": np.random.choice(["Berlin", "Munich", "Hamburg", "Cologne"], size=100),
    "experience": np.random.randint(0, 40, size=100)
})

y = np.random.randn(100, 1)

# Optional feature-specific preprocessing config
config = {
    "age": "ple",
    "income": "rbf",
    "experience": "quantile",
    "job": "one-hot",
    "city": "none"
}

# Initialize Preprocessor
preprocessor = Preprocessor(
    feature_preprocessing=config,
    task="regression"
)

# Fit and transform the data into a dictionary of feature arrays
X_dict = preprocessor.fit_transform(df, y)

# Optionally get a stacked array instead of a dictionary
X_array = preprocessor.transform(df, return_array=True)

# Get feature metadata
preprocessor.get_feature_info(verbose=True)

🪰 Included Transformers

pretab includes both sklearn-native and custom-built transformers:

🌈 Splines

CubicSplineTransformer
NaturalCubicSplineTransformer
PSplineTransformer
TensorProductSplineTransformer
ThinPlateSplineTransformer

🧠 Feature Maps

RBFExpansionTransformer
ReLUExpansionTransformer
SigmoidExpansionTransformer
TanhExpansionTransformer

📊 Encodings and Binning

PLETransformer
CustomBinTransformer
OneHotFromOrdinalTransformer
ContinuousOrdinalTransformer
LanguageEmbeddingTransformer

🔧 Utilities

NoTransformer
ToFloatTransformer

Plus: any sklearn transformer can be passed directly with full support for hyperparameters.

Using Transformers

Using the transformers follows the standard sklearn.preprocessing steps. I.e. using PLE

import numpy as np
from pretab.transformers import PLETransformer

x = np.random.randn(100, 1)
y = np.random.randn(100, 1)

x_ple = PLETransformer(n_bins=15, task="regression").fit_transform(x, y)

assert x_ple.shape[1] == 15

For splines, the penalty matrices can be extracted via .get_penalty_matrix()

import numpy as np
from pretab.transformers import ThinPlateSplineTransformer

x = np.random.randn(100, 1)

tp = ThinPlateSplineTransformer(n_basis=15)

x_tp = tp.fit_transform(x)

assert x_tp.shape[1] == 15

penalty = tp.get_penalty_matrix()

🧪 Running Tests

pytest --maxfail=2 --disable-warnings -v

🤝 Contributing

pretab is community-driven! Whether you’re fixing bugs, adding new encoders, or improving the docs — contributions are welcome.

git clone https://github.com/OpenTabular/pretab.git
cd pretab
pip install -e ".[dev]"

Then create a pull request 🚀

📄 License

MIT License. See LICENSE for details.

❤️ Acknowledgements

pretab builds on the strengths of:

scikit-learn

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.0.3

Jul 2, 2025

This version

0.0.2

Apr 13, 2025

0.0.1

Apr 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pretab-0.0.2.tar.gz (28.8 kB view details)

Uploaded Apr 13, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pretab-0.0.2-py3-none-any.whl (42.7 kB view details)

Uploaded Apr 13, 2025 Python 3

File details

Details for the file pretab-0.0.2.tar.gz.

File metadata

Download URL: pretab-0.0.2.tar.gz
Upload date: Apr 13, 2025
Size: 28.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.8.18

File hashes

Hashes for pretab-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`9b2429e41c0f9768e698dee605ee96ceb10ef819e52b28ff7360b4a4586536e4`
MD5	`54f237f11d2fb4a336af7ff5262555e8`
BLAKE2b-256	`664b7e8dc2d8421c023f35fe942ba75146b28cf3c25ae6c3f63330ba2bb91481`

See more details on using hashes here.

File details

Details for the file pretab-0.0.2-py3-none-any.whl.

File metadata

Download URL: pretab-0.0.2-py3-none-any.whl
Upload date: Apr 13, 2025
Size: 42.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.8.18

File hashes

Hashes for pretab-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`83b39920aa5feac3b89225eed14c17ffd54476ddae612daccd39c2ee0ab4ba24`
MD5	`efeb504bd854df9d85ccddef653466a1`
BLAKE2b-256	`186e98bca39b4225c14cb8f08183d74ace48f5ffef2907350f0987cb81518261`

See more details on using hashes here.

pretab 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

📦 pretab

✨ Features

💠 Installation

🚀 Quickstart

🪰 Included Transformers

🌈 Splines

🧠 Feature Maps

📊 Encodings and Binning

🔧 Utilities

Using Transformers

🧪 Running Tests

🤝 Contributing

📄 License

❤️ Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes