Tool to automagically save scikit-learn scaler properties to a portable, readable format.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

bridgescaler

Bridge your scikit-learn scaler parameters between Python sessions and users. Bridgescaler allows you to save the properties of a scikit-learn scaler object to a json file, and then repopulate a new scaler object with the same properties.

Dependencies

scikit-learn
numpy
pandas

Installation

For a stable version of bridgescaler, you can install from PyPI.

pip install bridgescaler

For the latest version of bridgescaler, install from github.

git clone https://github.com/NCAR/bridgescaler.git
cd bridgescaler
pip install .

Usage

bridgescaler supports all the common scikit-learn scaler classes:

StandardScaler
RobustScaler
MinMaxScaler
MaxAbsScaler
QuantileTransformer
PowerTransformer
SplineTransformer

First, create some synthetic data to transform.

import numpy as np
import pandas as pd

# specify distribution parameters for each variable
locs = np.array([0, 5, -2, 350.5], dtype=np.float32)
scales = np.array([1.0, 10, 0.1, 5000.0])
names = ["A", "B", "C", "D"]
num_examples = 205
x_data_dict = {}
for l in range(locs.shape[0]):
    # sample from random normal with different parameters
    x_data_dict[names[l]] = np.random.normal(loc=locs[l], scale=scales[l], size=num_examples)
x_data = pd.DataFrame(x_data_dict)

Now, let's fit and transform the data with StandardScaler.

from sklearn.preprocessing import StandardScaler
from bridgescaler import save_scaler, load_scaler
scaler = StandardScaler()
scaler.fit_transform(x_data)
filename = "x_standard_scaler.json"
# save to json file
save_scaler(scaler, filename)

# create new StandardScaler from json file information.
new_scaler = load_scaler(filename) # new_scaler is a StandardScaler object

Group Scaler

The group scalers use the same scaling parameters for a group of similar variables rather than scaling each column independently. This is useful for situations where variables are related, such as temperatures at different height levels.

Groups are specified as a list of column ids, which can be column names for pandas dataframes or column indices for numpy arrays.

For example:

from bridgescaler.group import GroupStandardScaler
import pandas as pd
import numpy as np
x_rand = np.random.random(size=(100, 5))
data = pd.DataFrame(data=x_rand, 
                    columns=["a", "b", "c", "d", "e"])
groups = [["a", "b"], ["c", "d"], "e"]
group_scaler = GroupStandardScaler()
x_transformed = group_scaler.fit_transform(data, groups=groups)

"a" and "b" are a single group and all values of both will be included when calculating the mean and standard deviation for that group.

Deep Scaler

The deep scalers are designed to scale 2 or 3 dimensional fields input into a deep learning model such as a convolutional neural network. The scalers assume that the last dimension is the channel/variable dimension and scales the values accordingly. The scalers can support 2D or 3D patches with no change in code structure.

Example:

from bridgescaler.deep import DeepStandardScaler
import numpy as np
np.random.seed(352680)
n_ex = 5000
n_channels = 4
dim = 32
means = np.array([1, 5, -4, 2.5], dtype=np.float32)
sds = np.array([10, 2, 43.4, 32.], dtype=np.float32)
x = np.zeros((n_ex, dim, dim, n_channels), dtype=np.float32)
for chan in range(n_channels):
    x[..., chan] = np.random.normal(means[chan], sds[chan], (n_ex, dim, dim))
dss = DeepStandardScaler()
dss.fit(x)
x_transformed = dss.transform(x)

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.7.0

Apr 29, 2024

0.6.0

Mar 29, 2024

0.5.1

Mar 21, 2024

0.5.0

Mar 20, 2024

0.4.2

Feb 12, 2024

0.4.1

Feb 12, 2024

0.4

Feb 9, 2024

0.3

Nov 14, 2023

This version

0.2

Apr 25, 2023

0.1b2 pre-release

Dec 6, 2022

0.1b1 pre-release

Dec 6, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bridgescaler-0.2.tar.gz (6.2 kB view hashes)

Uploaded Apr 25, 2023 Source

Built Distribution

bridgescaler-0.2-py3-none-any.whl (7.3 kB view hashes)

Uploaded Apr 25, 2023 Python 3

Hashes for bridgescaler-0.2.tar.gz

Hashes for bridgescaler-0.2.tar.gz
Algorithm	Hash digest
SHA256	`0934d4fd5180339d9b9822a346e749ca23549787895af454f41ff65e01277be0`
MD5	`8695a2f854f6ba3fdd103f32863b5ee2`
BLAKE2b-256	`dcc7093ee9e6a533d170d82a9db7013df6ffd095ace0ea696ed3a91393a50b25`

Hashes for bridgescaler-0.2-py3-none-any.whl

Hashes for bridgescaler-0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7c903ac1e9a623043b55c28cdb77f2ee216b40d8aabba7461d82693b9e875297`
MD5	`4da330d49cb77881fc9b83fdf67f48f3`
BLAKE2b-256	`6e1139f58552d645e4590d50e78d53b3903fbeb965b18c8184efa642b59565ea`