Skip to main content

Balanced splitting utility

Project description

balanced-splits

A utility library for splitting datasets in a balanced manner, with regards to several features.

Installation

pip install balanced-splits

Usage

import numpy as np
import pandas as pd
from balanced_splits.split import optimized_split

sample_size = 100
df = pd.DataFrame({
    'age': np.random.normal(loc=45, scale=7., size=sample_size),
    'skill': 1 - np.random.power(4, size=sample_size),
    'type': np.random.choice(['T1', 'T2', 'T3'], size=sample_size)
})

A, B = optimized_split(df)

print('Partition 1\n===========\n')
print(A.describe())
print(A['type'].value_counts())

print('\n\n')

print('Partition 2\n===========\n')
print(B.describe())
print(B['type'].value_counts())

Check out the "examples" section for more examples.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

balanced-splits-0.2.0.tar.gz (4.4 kB view details)

Uploaded Source

Built Distribution

balanced_splits-0.2.0-py3-none-any.whl (6.3 kB view details)

Uploaded Python 3

File details

Details for the file balanced-splits-0.2.0.tar.gz.

File metadata

  • Download URL: balanced-splits-0.2.0.tar.gz
  • Upload date:
  • Size: 4.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.5

File hashes

Hashes for balanced-splits-0.2.0.tar.gz
Algorithm Hash digest
SHA256 dee302aa2f6d4b4c01617dbf1ff2c5141f298122a5bbc26ed457c45d455984c4
MD5 69a43b39042b7c997c8b562df431a1c9
BLAKE2b-256 d12ce6a2dce1d7670d62c7f5242b20e339c9b0089deb726c151837962d0c73fc

See more details on using hashes here.

File details

Details for the file balanced_splits-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: balanced_splits-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 6.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.5

File hashes

Hashes for balanced_splits-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0b098e9c2b0e7e984aec304b61aa57a8b7de25b594b636fa5ea572a9ea0010f6
MD5 6e4090df6eec49115fe639ab2004c600
BLAKE2b-256 aef0e93330681b328a1c62d54d108ad6f83c0d335df6641c9ca1d0aa1c4fb0d1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page