Skip to main content

Sample Things EVENly

Project description

Steven

Steven (Sample Things EVENly) helps you sample your data in nice easy ways, evenly across the range of the data!

Steven is available on PyPI: pip install steven.

How to use Steven

The main method of steven is sample_data_evenly. This takes as input a sequence-liked object such as a list, tuple, np.ndarray or pd.Series, and samples it in such a way that the items returned represent a balanced distribution across the data range.

This is useful for balancing both continuous and discrete data for machine learning applications, among other things!

Let's set up an example and plot the distribution.

import numpy as np
import matplotlib.pyplot as plt

from steven.sampling import sample_data_evenly

# Seed for reproducibility
seed = 8675309
np.random.seed(seed)

# Create some data...
data = np.exp(np.random.rand(100_000))
plt.hist(data, bins=50, range=[data.min(), data.max()], label='All data')

# Now sample the data...
data_sampled = sample_data_evenly(data, n_bins=50, sample_size=20_000, random_state=seed)
plt.hist(data_sampled, bins=50, range=[data.min(), data.max()], label='Sampled')

plt.title('Sample data evenly, example')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.legend()
plt.show()

The result should look like this:

image

Keeping track of sampled indices

Optionally, subset_data_evenly accepts a return_ixs argument, which allows us to keep track of which indexes have been sampled from the inputted data. Continuing with the above example, we can do:

sampled_data, ixs = subset_data_evenly(data, n_bins=50, sample_size=10, random_state=seed, return_ixs=True)

This will return the sampled data and the ixs as a tuple:

>>> sampled_data, ixs 
(array([2.29744662, 1.56124329, 1.75257412, 1.39012692, 1.04761057,
        1.32016874, 1.98088368, 1.84552982, 2.6627304 , 1.5303134 ]),
 [49023, 44730, 83142, 98395, 37441, 81177, 9769, 38017, 3088, 59028])

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

steven-0.3.3.tar.gz (8.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

steven-0.3.3-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file steven-0.3.3.tar.gz.

File metadata

  • Download URL: steven-0.3.3.tar.gz
  • Upload date:
  • Size: 8.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for steven-0.3.3.tar.gz
Algorithm Hash digest
SHA256 14dcf15dde6fa30c7eccea8a9d883113b967ec097b4f01dfc54607eb6781556d
MD5 72552682e0a656713faedfa1620da270
BLAKE2b-256 5778e9db86c227eab5ec214a1c6bc243ea942380e6eb178620551e90e6d497d8

See more details on using hashes here.

File details

Details for the file steven-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: steven-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 6.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for steven-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 b9f17afb1026b8c3c795cff05609d26d407e82dd9c2ec22e9ad16b31cdb9e5ab
MD5 df82a8083d2e88eead4bf11d7bb4e14a
BLAKE2b-256 40181941e5395457c874dd566019640140e44c4e27df268ce3ae76567bc418c8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page