Simple python package to generate and cache holdouts with arbitrary depth.
Project description
Simple python package to generate and cache both random and chromosomal holdouts with arbitrary depth.
How do I get this package?
As usual, just use pip:
pip install holdouts_generator
Generating random holdouts
Suppose you want to generate 3 layers of holdouts, respectively with 0.3, 0.2 and 0.1 as test size and 5, 3 and 2 as quantity:
from holdouts_generator import holdouts_generator, random_holdouts
dataset = pd.read_csv("path/to/my/dataset.csv")
generator = holdouts_generator(
dataset,
holdouts=random_holdouts(
[0.3, 0.2, 0.1],
[5, 3, 2]
),
cache=False, # Set this parameter to True to enable automatic caching
cache_dir=".holdouts_cache" # This is the default cache directory
)
for (training, testing), inner_holdouts in generator():
for (inner_train, inner_test), small_holdouts in inner_holdouts():
for (small_train, small_test), _ in small_holdouts():
#do what you need :)
Generating chromosomal holdouts
Suppose you want to generate 2 layers of holdouts, two outer ones with chromosomes 17 and 18 and 3 inner ones, with chromosomes 17/18, 20 and 21:
from holdouts_generator import holdouts_generator, chromosomal_holdouts
dataset = pd.read_csv("path/to/my/genomic_dataset.csv")
generator = holdouts_generator(
dataset,
holdouts=chromosomal_holdouts([
([17], [([18], None), ([20], None), ([21], None)])
([18], [([17], None), ([20], None), ([21], None)])
]),
cache=False, # Set this parameter to True to enable automatic caching
cache_dir=".holdouts_cache" # This is the default cache directory
)
for (training, testing), inner_holdouts in generator():
for (inner_train, inner_test), _ in inner_holdouts():
#do what you need :)
Clearing the holdouts cache
Just run the method clear_holdouts_cache:
from holdouts_generator import clear_holdouts_cache
clear_holdouts_cache(
cache_dir=".holdouts_cache" # This is the default cache directory
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.