Sample Things EVENly
Project description
Steven
Steven (Sample Things EVENly) helps you sample your data in nice easy ways, evenly across the range of the data!
Steven is available on PyPI: pip install steven.
How to use Steven
The main method of steven is sample_data_evenly. This takes as input a sequence-liked object such as a list, tuple, np.ndarray or pd.Series, and samples it in such a way that the items returned represent a balanced distribution across the data range.
This is useful for balancing both continuous and discrete data for machine learning applications, among other things!
Let's set up an example and plot the distribution.
import numpy as np
import matplotlib.pyplot as plt
from steven.sampling import sample_data_evenly
# Seed for reproducibility
seed = 8675309
np.random.seed(seed)
# Create some data...
data = np.exp(np.random.rand(100_000))
plt.hist(data, bins=50, range=[data.min(), data.max()], label='All data')
# Now sample the data...
data_sampled = sample_data_evenly(data, n_bins=50, sample_size=20_000, random_state=seed)
plt.hist(data_sampled, bins=50, range=[data.min(), data.max()], label='Sampled')
plt.title('Sample data evenly, example')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.legend()
plt.show()
The result should look like this:
Keeping track of sampled indices
Optionally, subset_data_evenly accepts a return_ixs argument, which allows us to keep track of which indexes have been sampled from the inputted data. Continuing with the above example, we can do:
sampled_data, ixs = subset_data_evenly(data, n_bins=50, sample_size=10, random_state=seed, return_ixs=True)
This will return the sampled data and the ixs as a tuple:
>>> sampled_data, ixs
(array([2.29744662, 1.56124329, 1.75257412, 1.39012692, 1.04761057,
1.32016874, 1.98088368, 1.84552982, 2.6627304 , 1.5303134 ]),
[49023, 44730, 83142, 98395, 37441, 81177, 9769, 38017, 3088, 59028])
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file steven-0.3.2.tar.gz.
File metadata
- Download URL: steven-0.3.2.tar.gz
- Upload date:
- Size: 8.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9096da8257ee1e58a6423ae6abf09d9eb39a64b2d7226470423f97d8aafe5e21
|
|
| MD5 |
badf20020115f09fcc1298f3c69dfd6b
|
|
| BLAKE2b-256 |
93c9310dc7c5b4aabe5d3fce26a124ba369da59b2a4d7b18aa3a94ff6b2135d2
|
File details
Details for the file steven-0.3.2-py3-none-any.whl.
File metadata
- Download URL: steven-0.3.2-py3-none-any.whl
- Upload date:
- Size: 6.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7476b5c9e854f83166b25f5d3137a45e50a88366c9801d6026f12e741b4fd98
|
|
| MD5 |
08f21e9bbdf8c79760e700ba0e983bde
|
|
| BLAKE2b-256 |
d3420c42942264badb3f4ea8bc44dfe350715eeb01b6eb2838fa52fa8d99de40
|