Balanced splitting utility
Project description
balanced-splits
A utility library for splitting datasets in a balanced manner, with regards to several features.
Installation
pip install balanced-splits
Usage
import numpy as np import pandas as pd from balanced_splits.split import optimized_split sample_size = 100 df = pd.DataFrame({ 'age': np.random.normal(loc=45, scale=7., size=sample_size), 'skill': 1 - np.random.power(4, size=sample_size), 'type': np.random.choice(['T1', 'T2', 'T3'], size=sample_size) }) A, B = optimized_split(df) print('Partition 1\n===========\n') print(A.describe()) print(A['type'].value_counts()) print('\n\n') print('Partition 2\n===========\n') print(B.describe()) print(B['type'].value_counts())
Check out the "examples" section for more examples.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
balanced-splits-0.2.0.tar.gz
(4.4 kB
view hashes)
Built Distribution
Close
Hashes for balanced_splits-0.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0b098e9c2b0e7e984aec304b61aa57a8b7de25b594b636fa5ea572a9ea0010f6 |
|
MD5 | 6e4090df6eec49115fe639ab2004c600 |
|
BLAKE2-256 | aef0e93330681b328a1c62d54d108ad6f83c0d335df6641c9ca1d0aa1c4fb0d1 |