Python package to stratify split datasets based on endpoint distributions
Project description
Here's the updated documentation encapsulated in a code block for clarity:
vbnet Copy code
Ivers
Ivers offers a suite of tools designed for managing data splits while maintaining endpoint distributions, and introduces two novel temporal split techniques: 'Leaky' and 'All for Free'. This library ensures that data splits are suitable for realistic scenarios and rigorous testing needs in various applications. It was utilized to generate data splits in the research outlined in the linked paper.
Features
- Temporal Leaky: Simulates real-world scenarios by allowing forward-leakage in data, which might subtly influence future models.
- Temporal AllForFree: Ensures strict temporal separation, keeping training data completely independent of the test set—ideal for accurate long-term model predictions.
- Temporal Fold Split: Progressively increases the training set size across multiple folds, adhering to the temporal sequence, enhancing model robustness over time.
- Stratified Endpoint Split: Introduces a stratified approach to splitting, crucial for consistent endpoint distribution across different categories in datasets—beneficial in fields like cheminformatics and bioinformatics.
Code Functions
The library includes several functions tailored for different splitting strategies:
stratify_endpoint
,stratify_split_and_cv
: These functions generate train/test and cross-validation splits that respect endpoint distribution.leaky_endpoint_split
,allforone_endpoint_split
: Used for generating a single train/test split with respective temporal dynamics.allforone_folds_endpoint_split
,leaky_folds_endpoint_split
: Enable multiple sectional splits, increasing training data size consistently.balanced_scaffold_cv
: Supports balanced scaffold cross-validation, enhancing data representativeness in splits.
Integration with Chemprop
- Activating the
chemprop
configuration allows the library to generate splits that are directly compatible with the Chemprop framework, facilitating seamless integration and usage.
Getting Started or Contributing
To begin using Ivers, clone the repository and set up the necessary dependencies:
git clone https://github.com/IversOhlsson/ivers.git
cd ivers
pip install -r requirements.txt
Installation via pip
You can also install the package via pip:
pip install ivers
We welcome contributions! Feel free to open issues or pull requests on our GitHub repository.
Guide
Reference
when using this library, please cite the following paper:
@article{Ivers_1,
title={PlaceHolder},
author={PlaceHolder},
journal={PlaceHolder},
volume={PlaceHolder},
number={PlaceHolder},
pages={PlaceHolder},
year={PlaceHolder},
publisher={PlaceHolder}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ivers-0.2.2.tar.gz
.
File metadata
- Download URL: ivers-0.2.2.tar.gz
- Upload date:
- Size: 18.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 79d7e9c0543c255402ba52380ef1ec8f0625e90eeada7bc14637c51d21812fa7 |
|
MD5 | 9851890624aa19605cf7ad894cad5146 |
|
BLAKE2b-256 | 64b8d28771e887c32c5bab8b59362ce64b260bfc96ed345bf972c8379435666c |
File details
Details for the file ivers-0.2.2-py3-none-any.whl
.
File metadata
- Download URL: ivers-0.2.2-py3-none-any.whl
- Upload date:
- Size: 23.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f917f6e918da74cd9c059d9e12e4a277fef0831d2d24b0be370edcc4f68ee425 |
|
MD5 | 15f416094bec3d435faac11d9f24dc03 |
|
BLAKE2b-256 | aa402057d587be10adbecc9288a9309a6fd0686ab531c5bc6c32f3a072aed6dd |