Python package to stratify split datasets based on endpoint distributions
Project description
Ivers
This project offers tools for managing data splits, ensuring endpoint distributions are maintained, and presents two novel temporal split techniques: 'leaky' and 'all for free' splits. See the explanation below.
Note: This library was used in this paper PlaceHolder to generate the data splits.
Features
- Temporal Leaky: Allows for forward-leakage in your data to simulate real-world scenarios where future data might influence the model subtly.
- Temporal AllForFree: Provides a stricter temporal separation, ensuring that the training data is entirely independent of the test set, suitable for rigorous testing of model predictions over time.
- Temporal Fold Split: Implements a novel approach to increasing the training set size successively across multiple folds based on the temporal time sequence
- Stratified Endpoint Split: Our library introduces a stratified endpoint split, crucial for maintaining a consistent distribution of data across different categories or endpoints in your datasets. Especially useful in scenarios where endpoint distributions are critical, such as in cheminformatics and bioinformatics.
- Cross-Validation Support: Integrates capabilities to ensure that each cross-validation split maintains endpoint distribution, ideal for developing models that are generalizable across varied data conditions.
Integration with Chemprop
- By setting the
chemprop
variable totrue
, the library will generate splits compatible with the Chemprop library. This ensures that the features and train-test splits are generated in a way that can easily be used with Chemprop.
Getting Started or Contributing
To get started with this library, clone the repository and install the required dependencies:
git clone https://github.com/IversOhlsson/ivers.git
cd ivers
pip install -r requirements.txt
Installation via pip
You can also install the package via pip:
pip install ivers
We welcome contributions! Feel free to open issues or pull requests on our GitHub repository.
Guide
Reference
when using this library, please cite the following paper:
@article{Ivers_1,
title={PlaceHolder},
author={PlaceHolder},
journal={PlaceHolder},
volume={PlaceHolder},
number={PlaceHolder},
pages={PlaceHolder},
year={PlaceHolder},
publisher={PlaceHolder}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ivers-0.1.18.tar.gz
.
File metadata
- Download URL: ivers-0.1.18.tar.gz
- Upload date:
- Size: 16.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fee33fe7d158d959902839be44bc1083690331c450c7b3ff06b393db89b561d8 |
|
MD5 | a20eb33116856443f78072452fb8b347 |
|
BLAKE2b-256 | 2becfa2c3a71c75577b72e9f5caf04de3994b6a607cbdae3c1f76ef98ed270c6 |
File details
Details for the file ivers-0.1.18-py3-none-any.whl
.
File metadata
- Download URL: ivers-0.1.18-py3-none-any.whl
- Upload date:
- Size: 21.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6a062b3caa4e00ba2e46164773d581a4bd8d4cd834074e431820b23609347f41 |
|
MD5 | 0aba093d0294cbeab627bd12c97293bf |
|
BLAKE2b-256 | f48305c761da366714416020aa07cc6838455ce54c4c3ecd6a92f94a17e1b10e |