Skip to main content

Type Preserving Scaler

Project description

Type Preserving Scaler

CI

type_preserving_scaler provides a small wrapper around scikit-learn's StandardScaler for projects that want predictable output container types.

What it is

The package exposes one class:

from type_preserving_scaler import StandardScalerThatPreservesInputType

StandardScalerThatPreservesInputType behaves like sklearn.preprocessing.StandardScaler, with one added convention:

  • fit on a pandas DataFrame, and transform() / fit_transform() return a pandas DataFrame
  • fit on a NumPy array, and transform() / fit_transform() return a NumPy ndarray

For DataFrame inputs, the tested behavior preserves the output shape, columns, and index.

Why it exists

Plain StandardScaler can return NumPy arrays even when the caller is working with pandas data. That can force downstream code to manually rebuild DataFrames or carry column/index metadata separately.

This package keeps the common "pandas in, pandas out" workflow while retaining the familiar StandardScaler API.

How it works

The class subclasses sklearn.preprocessing.StandardScaler. During fit(), it delegates to the parent scaler, then calls scikit-learn's set_output() API:

  • set_output(transform="pandas") when fit() receives a pandas DataFrame
  • set_output(transform="default") otherwise

The output type is therefore determined by the data passed to fit(), not by each later call to transform().

Installation

pip install type_preserving_scaler

The package requires Python 3.8 or newer and depends on NumPy, pandas, and scikit-learn.

Usage

import pandas as pd
from type_preserving_scaler import StandardScalerThatPreservesInputType

df = pd.DataFrame({"a": [1, 2, 3], "b": [10, 20, 30]})

scaler = StandardScalerThatPreservesInputType()
scaled = scaler.fit_transform(df)

assert isinstance(scaled, pd.DataFrame)
assert scaled.columns.equals(df.columns)
assert scaled.index.equals(df.index)

NumPy inputs keep NumPy outputs:

import numpy as np
from type_preserving_scaler import StandardScalerThatPreservesInputType

array = np.array([[1.0, 2.0], [3.0, 4.0]])
scaled = StandardScalerThatPreservesInputType().fit_transform(array)

assert isinstance(scaled, np.ndarray)

Development

Install the development dependencies and the package in editable mode:

pip install -r requirements_dev.txt
pip install -e .

Common local commands:

make test
make lint
make docs

Limitations

  • This package only wraps StandardScaler; it is not a general adapter for all scikit-learn transformers.
  • The output type is chosen when fit() runs.
  • The installed scikit-learn version must provide the set_output() API used by the implementation.
  • The package preserves the output container type. It does not change StandardScaler's scaling behavior.

License

MIT.

Changelog

0.0.1

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

type_preserving_scaler-0.0.2.tar.gz (12.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

type_preserving_scaler-0.0.2-py2.py3-none-any.whl (4.6 kB view details)

Uploaded Python 2Python 3

File details

Details for the file type_preserving_scaler-0.0.2.tar.gz.

File metadata

  • Download URL: type_preserving_scaler-0.0.2.tar.gz
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for type_preserving_scaler-0.0.2.tar.gz
Algorithm Hash digest
SHA256 eb789ecaa90d66b124fab9a7d07fcafb701ed2c75e9ffa9553459c9145ac5a85
MD5 e09a40115af3a8b18166844f44c930e0
BLAKE2b-256 f28f3a430a7d247c3a46ddf3f45c20b5a1af7c848639c729c6f55aeca95530bc

See more details on using hashes here.

File details

Details for the file type_preserving_scaler-0.0.2-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for type_preserving_scaler-0.0.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 cb22e75c452c53042435be1224904f299d6a38ea76ca839c8495d7bffb5b110e
MD5 155bbbbfa1857a043b42986ba05fa2cc
BLAKE2b-256 f2bcceeb94cb6f02e53bc1db01c38105d2dbdf585136cc720503a04a3984ee78

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page