Skip to main content

Sample Things EVENly

Project description

Steven

Steven (Sample Things EVENly) helps you sample your data in nice easy ways, evenly across the range of the data!

Steven is available on PyPI: pip install steven.

How to use Steven

The main method of steven is sample_data_evenly. This takes as input a sequence-liked object such as a list, tuple, np.ndarray or pd.Series, and samples it in such a way that the items returned represent a balanced distribution across the data range.

This is useful for balancing both continuous and discrete data for machine learning applications, among other things!

Let's set up an example and plot the distribution.

import numpy as np
import matplotlib.pyplot as plt

from steven.sampling import sample_data_evenly

# Seed for reproducibility
seed = 8675309
np.random.seed(seed)

# Create some data...
data = np.exp(np.random.rand(100_000))
plt.hist(data, bins=50, range=[data.min(), data.max()], label='All data')

# Now sample the data...
data_sampled = sample_data_evenly(data, n_bins=50, sample_size=20_000, random_state=seed)
plt.hist(data_sampled, bins=50, range=[data.min(), data.max()], label='Sampled')

plt.title('Sample data evenly, example')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.legend()
plt.show()

The result should look like this:

image

Keeping track of sampled indices

Optionally, subset_data_evenly accepts a return_ixs argument, which allows us to keep track of which indexes have been sampled from the inputted data. Continuing with the above example, we can do:

sampled_data, ixs = subset_data_evenly(data, n_bins=50, sample_size=10, random_state=seed, return_ixs=True)

This will return the sampled data and the ixs as a tuple:

>>> sampled_data, ixs 
(array([2.29744662, 1.56124329, 1.75257412, 1.39012692, 1.04761057,
        1.32016874, 1.98088368, 1.84552982, 2.6627304 , 1.5303134 ]),
 [49023, 44730, 83142, 98395, 37441, 81177, 9769, 38017, 3088, 59028])

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

steven-0.3.2.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

steven-0.3.2-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file steven-0.3.2.tar.gz.

File metadata

  • Download URL: steven-0.3.2.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for steven-0.3.2.tar.gz
Algorithm Hash digest
SHA256 9096da8257ee1e58a6423ae6abf09d9eb39a64b2d7226470423f97d8aafe5e21
MD5 badf20020115f09fcc1298f3c69dfd6b
BLAKE2b-256 93c9310dc7c5b4aabe5d3fce26a124ba369da59b2a4d7b18aa3a94ff6b2135d2

See more details on using hashes here.

File details

Details for the file steven-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: steven-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 6.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for steven-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f7476b5c9e854f83166b25f5d3137a45e50a88366c9801d6026f12e741b4fd98
MD5 08f21e9bbdf8c79760e700ba0e983bde
BLAKE2b-256 d3420c42942264badb3f4ea8bc44dfe350715eeb01b6eb2838fa52fa8d99de40

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page