Skip to main content

Consistently partitions a dataset into a training set and a test set

Project description

Data Partitioner

Simple project that can be used to consistently partition a data set into two parts - a test set and a training set. There are also helpful methods that provide a way to partition into more groups of elements.

Installation

The easiest way to install this module is to install it via pip:

$ pip install data_partitioner

Usage

Using this module is dead simple. The main module (DatasetSuplier) offers two methods that return the training set (training_set()) or the test set (test_set()). Both of these methods are consitent, so no matter how many times you call them on the same object, they will return the same set of elements back.

You have two configuration options you can specify:

  • training_percent - the percent of the dataset used for the training set. It defaults to 0.8.

  • partitioning_function - the function that’s used to partition the dataset.

  • It defaults to data_partitioner.pseudorandom_function, which will randomly assign every element of the dataset to either the test set or the training set.

  • Another useful existing option you can set it to is data_partitioner.LinearFakeRandomFunction, which will make sure that no elements in the training set come after any elements of the test set.

  • You can also manually write this callable, which will take one parameter as input - the index of the element currently considered.

Example

from data_partitioner import DatasetSuplier

dataset = [
    ('Alice', 10, 23, 401),
    ('Bob', 20, 40, 812),
    ('Christine', 41, 92, 533),
    ('Dave', 843, 12, -5),
    ('Elizabeth', 682, 33, -7),
    ('Fred', 95, 642, 34),
]
suplier = DatasetSuplier(dataset)

for iteration in range(100):
    for element in suplier.training_set():
        do_train(element[1])
for element in suplier.test_set():
    do_evaluate(element[1])

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_partitioner-0.1.tar.gz (3.5 kB view details)

Uploaded Source

Built Distribution

data_partitioner-0.1-py3-none-any.whl (6.0 kB view details)

Uploaded Python 3

File details

Details for the file data_partitioner-0.1.tar.gz.

File metadata

File hashes

Hashes for data_partitioner-0.1.tar.gz
Algorithm Hash digest
SHA256 dd7a9bea91adb4655986197fde06d172b0a9f5f55696982a373e41969921f1c1
MD5 2878e365f05186bfd2eb80db802313ce
BLAKE2b-256 3237c00f967e6f48d3507eb23af76d4cb8232263c2b0443419cd71d7de13c074

See more details on using hashes here.

File details

Details for the file data_partitioner-0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for data_partitioner-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 794f0244fa3bcee17e0b878f8f7e11b33ba72717e84b48cdc26df7f48f4be4eb
MD5 c22acf4b57bb87cd423603036fb2b4aa
BLAKE2b-256 2fb151a87731b6d6f3745ffcc9ddecb9e9193f047f7404eceb6a1259cc223041

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page