Skip to main content

WILDS distribution shift data

Project description

This repository provides a simpler interface to access the Wild-Time datasets in PyTorch. In contrast to the original repository, this repository contains only code relevant for data loading and has fewer dependencies.

yearbook.png

Installation

Wild-Time-Data is available via PyPI.

pip install wild-time-data

Usage

The following code will return a PyTorch dataset for the training partition of the arXiv dataset in 2023. The data will be downloaded to wild-time-data/ unless it was downloaded into this folder before.

from wild_time_data import load_dataset

load_dataset(dataset_name="arxiv", time_step=2023, split="train", data_dir="wild-time-data")

In the following we provide details about the available argument options.

  • dataset_name: The options are arxiv, drug, fmow, huffpost, and yearbook. This list can be accessed via

    from wild_time_data import list_datasets
    
    list_datasets()
  • time_step: Most datasets are grouped by year, this argument will allow you to access the data from different time intervals. The range differs from dataset to dataset. Use following command to get a list of available time steps:

    from wild_time_data import available_time_steps
    
    available_time_steps("arxiv")
  • split: Selects the partition. Can either be train or test.

  • data_dir: Location where to store the data. By default it will be downloaded to ~/wild-time-data/.

Other Useful Functions

Several other functions can be imported from wild_time_data.

from wild_time_data import available_time_steps, input_dim, list_datasets, num_outputs
  • available_time_steps: Given the dataset name, a sorted list of available time steps is returned. Example: available_time_steps("huffpost") returns [2012, 2013, 2014, 2015, 2016, 2017, 2018].

  • input_dim: Given the dataset name, the input dimensionality is returned. For image datasets the shape of the image is returned. For text datasets the maximum number of words separated by spaces is returned. Example: input_dim("yearbook") returns (1, 32, 32).

  • list_datasets: Returns the list of all available datasets. Example: list_datasets() returns ["arxiv", "drug", "fmow", "huffpost", "yearbook"].

  • num_outputs: Given the dataset name, either the number of classes is returned or it returns 1. In cases where 1 is returned, this indicates that this is a regression dataset. Example: num_outputs("arxiv") returns 172.

Licenses

All additional code for Wild-Time-Data is available under the Apache 2.0 license. The license for each Wild-Time dataset is listed below:

Furthermore, this repository is loosely based on the Wild-Time repository which is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wild_time_data-0.0.5.tar.gz (231.9 kB view details)

Uploaded Source

Built Distribution

wild_time_data-0.0.5-py3-none-any.whl (10.2 kB view details)

Uploaded Python 3

File details

Details for the file wild_time_data-0.0.5.tar.gz.

File metadata

  • Download URL: wild_time_data-0.0.5.tar.gz
  • Upload date:
  • Size: 231.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.3

File hashes

Hashes for wild_time_data-0.0.5.tar.gz
Algorithm Hash digest
SHA256 ba869ed1fe6b224d072cca8617b61ae40573ac8cb7dc2ec7e6fc3c713fee75c6
MD5 8ca32a1a3cfb59bf217bc069cedb985f
BLAKE2b-256 e17a8254de7e7b96af9cc5205aa0230aeb6169eae1676b95a190047924ed97f2

See more details on using hashes here.

File details

Details for the file wild_time_data-0.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for wild_time_data-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 6150ab0bf37a81c069e91e73ae944f39c6b8d5864568085e20bd52c98f93ed7e
MD5 a6c0e0f82a3efd0e48c545f1db048846
BLAKE2b-256 7e515d659a3c43f313fc35297337ce9009acb9f6cf8df5040e72fa5b90cdfc27

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page