Simple python package to generate and cache both random and chromosomal holdouts with arbitrary depth.
Project description
Simple python package to generate and cache both random and chromosomal holdouts with arbitrary depth.
How do I install this package?
As usual, just download it using pip:
pip install holdouts_generator
Tests Coverage
Since some software handling coverages sometime get slightly different results, here’s three of them:
Generating random holdouts
Suppose you want to generate 3 layers of holdouts, respectively with 0.3, 0.2 and 0.1 as test size and 5, 3 and 2 as quantity:
from holdouts_generator import holdouts_generator, random_holdouts
dataset = pd.read_csv("path/to/my/dataset.csv")
generator = holdouts_generator(
dataset,
holdouts=random_holdouts(
[0.3, 0.2, 0.1],
[5, 3, 2]
),
cache=False, # Set this parameter to True to enable automatic caching
memory_cache=False, # Set this parameter to True to enable automatic in memory caching, usefull when you would reload the objects multiple times
cache_dir=".holdouts" # This is the default cache directory
)
for (training, testing), inner_holdouts in generator():
for (inner_train, inner_test), small_holdouts in inner_holdouts():
for (small_train, small_test), _ in small_holdouts():
#do what you need :)
Generating chromosomal holdouts
Suppose you want to generate 2 layers of holdouts, two outer ones with chromosomes 17 and 18 and 3 inner ones, with chromosomes 17/18, 20 and 21:
from holdouts_generator import holdouts_generator, chromosomal_holdouts
dataset = pd.read_csv("path/to/my/genomic_dataset.csv")
generator = holdouts_generator(
dataset,
holdouts=chromosomal_holdouts([
([17], [([18], None), ([20], None), ([21], None)])
([18], [([17], None), ([20], None), ([21], None)])
]),
cache=False, # Set this parameter to True to enable automatic caching
memory_cache=False, # Set this parameter to True to enable automatic in memory caching, usefull when you would reload the objects multiple times
cache_dir=".holdouts" # This is the default cache directory
)
for (training, testing), inner_holdouts in generator():
for (inner_train, inner_test), _ in inner_holdouts():
#do what you need :)
Clearing the holdouts cache
Just run the method clear_cache:
from holdouts_generator import clear_cache
clear_cache(
cache_dir=".holdouts" # This is the default cache directory
)
Clearing the holdouts memory cache
Just run the method clear_memory_cache:
from holdouts_generator import clear_memory_cache
clear_memory_cache(
cache_dir=".holdouts" # This is the default cache directory
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for holdouts_generator-0.0.13.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0905b8deb544214516c05aec33635e8860cd6b5d0d03b98ac301fe8a1f610a5a |
|
MD5 | 10bb89658bac221b35f5e2b02ae686b4 |
|
BLAKE2b-256 | b00089f9ef5f1cb04e10fbdc82a30f893e646c7dad0bb91aee0348b88e2ded6c |