Consistently partitions a dataset into a training set and a test set
Project description
Data Partitioner
Simple project that can be used to consistently partition a data set into two parts - a test set and a training set. There are also helpful methods that provide a way to partition into more groups of elements.
Installation
The easiest way to install this module is to install it via pip:
$ pip install data_partitioner
Usage
Using this module is dead simple. The main module (DatasetSuplier) offers two methods that return the training set (training_set()) or the test set (test_set()). Both of these methods are consitent, so no matter how many times you call them on the same object, they will return the same set of elements back.
You have two configuration options you can specify:
training_percent - the percent of the dataset used for the training set. It defaults to 0.8.
partitioning_function - the function that’s used to partition the dataset.
It defaults to data_partitioner.pseudorandom_function, which will randomly assign every element of the dataset to either the test set or the training set.
Another useful existing option you can set it to is data_partitioner.LinearFakeRandomFunction, which will make sure that no elements in the training set come after any elements of the test set.
You can also manually write this callable, which will take one parameter as input - the index of the element currently considered.
Example
from data_partitioner import DatasetSuplier dataset = [ ('Alice', 10, 23, 401), ('Bob', 20, 40, 812), ('Christine', 41, 92, 533), ('Dave', 843, 12, -5), ('Elizabeth', 682, 33, -7), ('Fred', 95, 642, 34), ] suplier = DatasetSuplier(dataset) for iteration in range(100): for element in suplier.training_set(): do_train(element[1]) for element in suplier.test_set(): do_evaluate(element[1])
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file data_partitioner-0.1.tar.gz
.
File metadata
- Download URL: data_partitioner-0.1.tar.gz
- Upload date:
- Size: 3.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dd7a9bea91adb4655986197fde06d172b0a9f5f55696982a373e41969921f1c1 |
|
MD5 | 2878e365f05186bfd2eb80db802313ce |
|
BLAKE2b-256 | 3237c00f967e6f48d3507eb23af76d4cb8232263c2b0443419cd71d7de13c074 |
File details
Details for the file data_partitioner-0.1-py3-none-any.whl
.
File metadata
- Download URL: data_partitioner-0.1-py3-none-any.whl
- Upload date:
- Size: 6.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 794f0244fa3bcee17e0b878f8f7e11b33ba72717e84b48cdc26df7f48f4be4eb |
|
MD5 | c22acf4b57bb87cd423603036fb2b4aa |
|
BLAKE2b-256 | 2fb151a87731b6d6f3745ffcc9ddecb9e9193f047f7404eceb6a1259cc223041 |