Balanced splitting utility
Project description
balanced-splits
A utility library for splitting datasets in a balanced manner, with regards to several features.
Installation
pip install balanced-splits
Usage
import numpy as np
import pandas as pd
from balanced_splits.split import optimized_split
sample_size = 100
df = pd.DataFrame({
'age': np.random.normal(loc=45, scale=7., size=sample_size),
'skill': 1 - np.random.power(4, size=sample_size),
'type': np.random.choice(['T1', 'T2', 'T3'], size=sample_size)
})
A, B = optimized_split(df)
print('Partition 1\n===========\n')
print(A.describe())
print(A['type'].value_counts())
print('\n\n')
print('Partition 2\n===========\n')
print(B.describe())
print(B['type'].value_counts())
Check out the "examples" section for more examples.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
balanced-splits-0.1.0.tar.gz
(4.3 kB
view hashes)
Built Distribution
Close
Hashes for balanced_splits-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9b371661c2d9df13aef9a9955b81e811bb77a5e501e726436f206ee8058afd0b |
|
MD5 | 424f182e49e599cc74895881798e649b |
|
BLAKE2b-256 | 656c8e26e094593a8adf0a1942ad3562b11fca677ad14198fb9d9326bbb68a70 |