tubular

Python package implementing ML feature engineering and pre-processing for polars or pandas dataframes.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

LVGIG

These details have not been verified by PyPI

Project links

Documentation

Project description

Feature engineering on polars and pandas dataframes for machine learning!

PyPI Read the Docs GitHub GitHub last commit GitHub issues Build

tubular implements pre-processing steps for tabular data commonly used in machine learning pipelines.

The transformers are compatible with scikit-learn Pipelines. Each has a transform method to apply the pre-processing step to data and a fit method to learn the relevant information from the data, if applicable.

The transformers in tubular are written in narwhals narwhals, so are agnostic between pandas and polars dataframes, and will utilise the chosen (pandas/polars) API under the hood.

There are a variety of transformers to assist with;

capping
dates
imputation
mapping
categorical encoding
numeric operations

Here is a simple example of applying capping to two columns;

import polars as pl

transformer = CappingTransformer(
    capping_values={"a": [10, 20], "b": [1, 3]},
)

test_df = pl.DataFrame({"a": [1, 15, 18, 25], "b": [6, 2, 7, 1], "c": [1, 2, 3, 4]})

transformer.transform(test_df)
# ->
# shape: (4, 3)
# ┌─────┬─────┬─────┐
# │ a   ┆ b   ┆ c   │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ i64 │
# ╞═════╪═════╪═════╡
# │ 10  ┆ 3   ┆ 1   │
# │ 15  ┆ 2   ┆ 2   │
# │ 18  ┆ 3   ┆ 3   │
# │ 20  ┆ 1   ┆ 4   │
# └─────┴─────┴─────┘

Tubular also supports saving/reading transformers and pipelines to/from json format (goodbye .pkls!), which we demo below:

import polars as pl
from tubular.imputers import MeanImputer, MedianImputer
from sklearn.pipeline import Pipeline
from tubular.pipeline import dump_pipeline_to_json, load_pipeline_from_json

# Create a simple dataframe

df = pl.DataFrame({"a": [1, 5], "b": [10, None]})

# Add imputers
median_imputer = MedianImputer(columns=["b"])
mean_imputer = MeanImputer(columns=["b"])

# Create and fit the pipeline
original_pipeline = Pipeline(
    [("MedianImputer", median_imputer), ("MeanImputer", mean_imputer)]
)
original_pipeline = original_pipeline.fit(df)

# Dumping the pipeline to JSON
pipeline_json = dump_pipeline_to_json(original_pipeline)
pipeline_json

# Printed value:
# ->
# {
# 'MedianImputer': {
#     'tubular_version': '2.6.1',
#     'classname': 'MedianImputer',
#     'init': {
#          'columns': ['b'],
#          'copy': False,
#          'verbose': False,
#          'return_native': True,
#          'weights_column': None
#          },
#     'fit': {
#           'impute_values_': {'b': 10.0}
#           }
#      },
# 'MeanImputer': {
#      'tubular_version': '2.6.1',
#      'classname': 'MeanImputer',
#      'init': {
#          'columns': ['b'],
#          'copy': False,
#          'verbose': False,
#          'return_native': True,
#          'weights_column': None
#           },
#      'fit': {
#          'impute_values_': {
#          'b': 10.0
#          }
#     }
# }

# Load the pipeline from JSON
pipeline = load_pipeline_from_json(pipeline_json)

# Verify the reconstructed pipeline
print(pipeline)

# Printed value:
# Pipeline(steps=[('MedianImputer', MedianImputer(columns=['b'])),
#                 ('MeanImputer', MeanImputer(columns=['b']))])

We are currently in the process of rolling out support for polars lazyframes!

track our progress below:

	polars_compatible	pandas_compatible	jsonable	lazyframe_compatible
AggregateColumnsOverRowTransformer	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:
AggregateRowsOverColumnTransformer	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:
ArbitraryImputer	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:
BetweenDatesTransformer	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:
CappingTransformer	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:
ColumnDtypeSetter	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:
CompareTwoColumnsTransformer	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:
DateDifferenceTransformer	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:
DatetimeComponentExtractor	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:
DatetimeInfoExtractor	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:
DatetimeSinusoidCalculator	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:
DifferenceTransformer	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:
ExtractStringComponentsTransformer	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:
GroupRareLevelsTransformer	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:
LowerCaseTransformer	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:
MappingTransformer	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:
MeanImputer	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:
MeanResponseTransformer	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:
MedianImputer	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:
ModeImputer	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:
NullIndicator	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:
OneDKmeansTransformer	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:x:
OneHotEncodingTransformer	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:
OutOfRangeNullTransformer	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:
RatioTransformer	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:
RemoveCharactersTransformer	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:
RenameColumnsTransformer	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:
SetValueTransformer	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:
StringContainsTransformer	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:
ToDatetimeTransformer	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:
WhenThenOtherwiseTransformer	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:

Installation

The easiest way to get tubular is directly from pypi with;

pip install tubular

Documentation

The documentation for tubular can be found on readthedocs.

Instructions for building the docs locally can be found in docs/README.

Examples

We utilise doctest to keep valid usage examples in the docstrings of transformers in the package, so please see these for getting started!

Issues

For bugs and feature requests please open an issue.

Build and test

The test framework we are using for this project is pytest. To build the package locally and run the tests follow the steps below.

First clone the repo and move to the root directory;

git clone https://github.com/azukds/tubular.git
cd tubular

Next install tubular and development dependencies;

pip install . -r requirements-dev.txt

Finally run the test suite with pytest;

pytest

Contribute

tubular is under active development, we're super excited if you're interested in contributing!

See the CONTRIBUTING file for the full details of our working practices.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

LVGIG

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

3.7.1

Jun 2, 2026

This version

3.7.0

May 29, 2026

3.6.0

May 26, 2026

3.5.0

May 21, 2026

3.4.0

May 13, 2026

3.3.0

Apr 30, 2026

3.2.0

Apr 14, 2026

3.1.0

Mar 20, 2026

3.0.0

Mar 19, 2026

2.8.0

Feb 23, 2026

2.7.0

Jan 20, 2026

2.6.0

Dec 19, 2025

2.5.0

Dec 16, 2025

2.4.0

Dec 1, 2025

2.3.0

Nov 18, 2025

2.2.0

Nov 11, 2025

2.1.0

Oct 30, 2025

2.0.0

Oct 16, 2025

1.4.8

Sep 3, 2025

1.4.7

Aug 21, 2025

1.4.6

Aug 19, 2025

1.4.5

Aug 19, 2025

1.4.4

Jun 24, 2025

1.4.3

Jun 2, 2025

1.4.2

Apr 8, 2025

1.4.1

Dec 2, 2024

1.4.0

Oct 15, 2024

1.3.1

Jul 18, 2024

1.3.0

Jun 13, 2024

1.2.2

Feb 20, 2024

1.2.1

Feb 8, 2024

1.2.0

Feb 6, 2024

1.1.1

Jan 18, 2024

1.1.0

Dec 19, 2023

1.0.0

Jul 24, 2023

0.3.8

Jul 10, 2023

0.3.7

Jul 5, 2023

0.3.6

May 24, 2023

0.3.5

Apr 27, 2023

0.3.4

Mar 21, 2023

0.3.3

Jan 19, 2023

0.3.2

Jan 28, 2022

0.3.1

Nov 9, 2021

0.3.0

Nov 3, 2021

0.2.15

Oct 6, 2021

0.2.14

Apr 23, 2021

0.0.0

Jan 14, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tubular-3.7.0.tar.gz (272.3 kB view details)

Uploaded May 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tubular-3.7.0-py3-none-any.whl (96.5 kB view details)

Uploaded May 29, 2026 Python 3

File details

Details for the file tubular-3.7.0.tar.gz.

File metadata

Download URL: tubular-3.7.0.tar.gz
Upload date: May 29, 2026
Size: 272.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tubular-3.7.0.tar.gz
Algorithm	Hash digest
SHA256	`ef7041f358a3bd9ff38648fbd3c7fb7383c9e635f4a4da81cc77a2f62d845211`
MD5	`f0743aa9bc8238b3c148a53c11a0b272`
BLAKE2b-256	`ea8d79e2005f6498fa8fd95ebe3e55298785777f5bf1c44fd3ad229c03821c6c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tubular-3.7.0.tar.gz:

Publisher: release.yml on azukds/tubular

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tubular-3.7.0.tar.gz
- Subject digest: ef7041f358a3bd9ff38648fbd3c7fb7383c9e635f4a4da81cc77a2f62d845211
- Sigstore transparency entry: 1669320848
- Sigstore integration time: May 29, 2026
Source repository:
- Permalink: azukds/tubular@ba8d1758f31981f827137931f65db4f16044e962
- Branch / Tag: refs/tags/v3.7.0
- Owner: https://github.com/azukds
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@ba8d1758f31981f827137931f65db4f16044e962
- Trigger Event: release

File details

Details for the file tubular-3.7.0-py3-none-any.whl.

File metadata

Download URL: tubular-3.7.0-py3-none-any.whl
Upload date: May 29, 2026
Size: 96.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tubular-3.7.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`eede72e7991fef00e4513f87283c7df04a928918aec3be92cf9b48cc0759ae53`
MD5	`31483112c3096f88f7240de568075ff5`
BLAKE2b-256	`6d1f60296a57847ed001ab505330c51485d0dc17c962a312fd9b7552fbb385f9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tubular-3.7.0-py3-none-any.whl:

Publisher: release.yml on azukds/tubular

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tubular-3.7.0-py3-none-any.whl
- Subject digest: eede72e7991fef00e4513f87283c7df04a928918aec3be92cf9b48cc0759ae53
- Sigstore transparency entry: 1669320968
- Sigstore integration time: May 29, 2026
Source repository:
- Permalink: azukds/tubular@ba8d1758f31981f827137931f65db4f16044e962
- Branch / Tag: refs/tags/v3.7.0
- Owner: https://github.com/azukds
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@ba8d1758f31981f827137931f65db4f16044e962
- Trigger Event: release

tubular 3.7.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Installation

Documentation

Examples

Issues

Build and test

Contribute

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance