Skip to main content

Simple python package to generate and cache holdouts with arbitrary depth.

Project description

travis sonar_quality sonar_maintainability sonar_coverage Maintainability pip

Simple python package to generate and cache both random and chromosomal holdouts with arbitrary depth.

How do I get this package?

As usual, just use pip:

pip install holdouts_generator

Generating random holdouts

Suppose you want to generate 3 layers of holdouts, respectively with 0.3, 0.2 and 0.1 as test size and 5, 3 and 2 as quantity:

from holdouts_generator import holdouts_generator, random_holdouts
dataset = pd.read_csv("path/to/my/dataset.csv")
generator = holdouts_generator(
    dataset,
    holdouts=random_holdouts(
        [0.3, 0.2, 0.1],
        [5, 3, 2]
    ),
    cache=False, # Set this parameter to True to enable automatic caching
    cache_dir=".holdouts" # This is the default cache directory
)

for (training, testing), inner_holdouts in generator():
    for (inner_train, inner_test), small_holdouts in inner_holdouts():
        for (small_train, small_test), _ in small_holdouts():
            #do what you need :)

Generating chromosomal holdouts

Suppose you want to generate 2 layers of holdouts, two outer ones with chromosomes 17 and 18 and 3 inner ones, with chromosomes 17/18, 20 and 21:

from holdouts_generator import holdouts_generator, chromosomal_holdouts
dataset = pd.read_csv("path/to/my/genomic_dataset.csv")
generator = holdouts_generator(
    dataset,
    holdouts=chromosomal_holdouts([
        ([17], [([18], None), ([20], None), ([21], None)])
        ([18], [([17], None), ([20], None), ([21], None)])
    ]),
    cache=False, # Set this parameter to True to enable automatic caching
    cache_dir=".holdouts" # This is the default cache directory
)

for (training, testing), inner_holdouts in generator():
    for (inner_train, inner_test), _ in inner_holdouts():
        #do what you need :)

Clearing the holdouts cache

Just run the method clear_holdouts_cache:

from holdouts_generator import clear_holdouts_cache

clear_holdouts_cache(
    cache_dir=".holdouts" # This is the default cache directory
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

holdouts_generator-0.0.6.tar.gz (4.8 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page