Skip to main content

Package to perform pre processing steps for machine learning models

Project description

Tubular pre-processing for machine learning!


PyPI Read the Docs GitHub GitHub last commit GitHub issues Build Binder

tubular implements pre-processing steps for tabular data commonly used in machine learning pipelines.

The transformers are compatible with scikit-learn Pipelines. Each has a transform method to apply the pre-processing step to data and a fit method to learn the relevant information from the data, if applicable.

The transformers in tubular work with data in pandas DataFrames.

There are a variety of transformers to assist with;

  • capping
  • dates
  • imputation
  • mapping
  • categorical encoding
  • numeric operations

Here is a simple example of applying capping to two columns;

from tubular.capping import CappingTransformer
import pandas as pd
from sklearn.datasets import fetch_california_housing

# load the california housing dataset
cali = fetch_california_housing()
X = pd.DataFrame(cali['data'], columns=cali['feature_names'])

# initialise a capping transformer for 2 columns
capper = CappingTransformer(capping_values = {'AveOccup': [0, 10], 'HouseAge': [0, 50]})

# transform the data
X_capped = capper.transform(X)

Installation

The easiest way to get tubular is directly from pypi with;

pip install tubular

Documentation

The documentation for tubular can be found on readthedocs.

Instructions for building the docs locally can be found in docs/README.

Examples

To help get started there are example notebooks in the examples folder in the repo that show how to use each transformer.

To open the example notebooks in binder click here or click on the launch binder shield above and then click on the directory button in the side bar to the left to navigate to the specific notebook.

Issues

For bugs and feature requests please open an issue.

Build and test

The test framework we are using for this project is pytest. To build the package locally and run the tests follow the steps below.

First clone the repo and move to the root directory;

git clone https://github.com/lvgig/tubular.git
cd tubular

Next install tubular and development dependencies;

pip install . -r requirements-dev.txt

Finally run the test suite with pytest;

pytest

Contribute

tubular is under active development, we're super excited if you're interested in contributing!

See the CONTRIBUTING file for the full details of our working practices.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tubular-1.4.4.tar.gz (828.9 kB view details)

Uploaded Source

Built Distribution

tubular-1.4.4-py3-none-any.whl (58.9 kB view details)

Uploaded Python 3

File details

Details for the file tubular-1.4.4.tar.gz.

File metadata

  • Download URL: tubular-1.4.4.tar.gz
  • Upload date:
  • Size: 828.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for tubular-1.4.4.tar.gz
Algorithm Hash digest
SHA256 669488695769c867540829cfb95dc1a076007243bb6dd8d598a18a4aaaa8b468
MD5 b441d0f2b9e3170f5f054959daa41c90
BLAKE2b-256 060e444c7fbb0d8f95df92bb4d50f548e38fd6d9a071b370f0316ffd687ba4c4

See more details on using hashes here.

Provenance

The following attestation bundles were made for tubular-1.4.4.tar.gz:

Publisher: release.yml on azukds/tubular

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tubular-1.4.4-py3-none-any.whl.

File metadata

  • Download URL: tubular-1.4.4-py3-none-any.whl
  • Upload date:
  • Size: 58.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for tubular-1.4.4-py3-none-any.whl
Algorithm Hash digest
SHA256 88e2bbba0018f550afce5bc931ab7044f2bf13eadd3686485b131878b4044780
MD5 c74f7e8e3abc6904d1ed92d7e41967ab
BLAKE2b-256 273e6d2d07dc2486d8e49690a7a3f071f0bc084613290332530cd5c74d57329d

See more details on using hashes here.

Provenance

The following attestation bundles were made for tubular-1.4.4-py3-none-any.whl:

Publisher: release.yml on azukds/tubular

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page