Balanced splitting utility
Project description
balanced-splits
A utility library for splitting datasets in a balanced manner, with regards to several features.
Installation
pip install balanced-splits
Usage
import numpy as np
import pandas as pd
from balanced_splits.split import optimized_split
sample_size = 100
df = pd.DataFrame({
'age': np.random.normal(loc=45, scale=7., size=sample_size),
'skill': 1 - np.random.power(4, size=sample_size),
'type': np.random.choice(['T1', 'T2', 'T3'], size=sample_size)
})
A, B = optimized_split(df)
print('Partition 1\n===========\n')
print(A.describe())
print(A['type'].value_counts())
print('\n\n')
print('Partition 2\n===========\n')
print(B.describe())
print(B['type'].value_counts())
Check out the "examples" section for more examples.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
balanced-splits-0.2.0.tar.gz
(4.4 kB
view hashes)
Built Distribution
Close
Hashes for balanced_splits-0.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0b098e9c2b0e7e984aec304b61aa57a8b7de25b594b636fa5ea572a9ea0010f6 |
|
MD5 | 6e4090df6eec49115fe639ab2004c600 |
|
BLAKE2b-256 | aef0e93330681b328a1c62d54d108ad6f83c0d335df6641c9ca1d0aa1c4fb0d1 |