Skip to main content

Python package to stratify split datasets based on endpoint distributions

Project description

Ivers

This project offers tools for managing data splits, ensuring endpoint distributions are maintained, and presents two novel temporal split techniques: 'leaky' and 'all for free' splits. See the explanation below.

Note: This library was used in this paper PlaceHolder to generate the data splits.

Features

  • Temporal Leaky: Allows for forward-leakage in your data to simulate real-world scenarios where future data might influence the model subtly.
  • Temporal AllForFree: Provides a stricter temporal separation, ensuring that the training data is entirely independent of the test set, suitable for rigorous testing of model predictions over time.
  • Temporal Fold Split: Implements a novel approach to increasing the training set size successively across multiple folds based on the temporal time sequence
  • Stratified Endpoint Split: Our library introduces a stratified endpoint split, crucial for maintaining a consistent distribution of data across different categories or endpoints in your datasets. Especially useful in scenarios where endpoint distributions are critical, such as in cheminformatics and bioinformatics.
  • Cross-Validation Support: Integrates capabilities to ensure that each cross-validation split maintains endpoint distribution, ideal for developing models that are generalizable across varied data conditions.

Integration with Chemprop

  • By setting the chemprop variable to true, the library will generate splits compatible with the Chemprop library. This ensures that the features and train-test splits are generated in a way that can easily be used with Chemprop.

Getting Started or Contributing

To get started with this library, clone the repository and install the required dependencies:

git clone https://github.com/IversOhlsson/ivers.git
cd ivers
pip install -r requirements.txt

Installation via pip

You can also install the package via pip:

pip install ivers

We welcome contributions! Feel free to open issues or pull requests on our GitHub repository.

Guide

Reference

when using this library, please cite the following paper:

@article{Ivers_1,
  title={PlaceHolder},
  author={PlaceHolder},
  journal={PlaceHolder},
  volume={PlaceHolder},
  number={PlaceHolder},
  pages={PlaceHolder},
  year={PlaceHolder},
  publisher={PlaceHolder}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ivers-0.1.23.tar.gz (16.3 kB view details)

Uploaded Source

Built Distribution

ivers-0.1.23-py3-none-any.whl (21.8 kB view details)

Uploaded Python 3

File details

Details for the file ivers-0.1.23.tar.gz.

File metadata

  • Download URL: ivers-0.1.23.tar.gz
  • Upload date:
  • Size: 16.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.4

File hashes

Hashes for ivers-0.1.23.tar.gz
Algorithm Hash digest
SHA256 38b35703b96d956e73f70ec56ce81df74c0ef65dd4f536a39951c538147c527c
MD5 e7783a6442cceb7fe3ea14f9b2e7c29d
BLAKE2b-256 ff98765cff093caf5095b0e2a8755e3b4765a7a2a41da2078ce21f4b8f412aa7

See more details on using hashes here.

File details

Details for the file ivers-0.1.23-py3-none-any.whl.

File metadata

  • Download URL: ivers-0.1.23-py3-none-any.whl
  • Upload date:
  • Size: 21.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.4

File hashes

Hashes for ivers-0.1.23-py3-none-any.whl
Algorithm Hash digest
SHA256 5374023abc2fd3df6bfd2cb72a47465331b53b0e4395b4d0e72e74e418357a46
MD5 49f74b97013f472fecd3adca7f9d20c4
BLAKE2b-256 8af0ec952b0b152da2a8de8603ee8dc4a7a2d4558c2b36b1950ca14fa03e3cde

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page