Skip to main content

Companion app for the `time-split` library.

Project description

Time Split

Time-based k-fold validation splits for heterogeneous data.


PyPI - Version PyPI - Python Version Tests Codecov Read the Docs PyPI - License Docker Image Size (tag)

Plotted folds on a two-by-two grid.

Folds plotted on a two-by-two grid. See the examples page for more.

About this image

The Time Split application (available here) is designed to help evaluate the effects of different parameters. To start it locally, run

docker run -p 8501:8501 rsundqvist/time-split

or

pip install time-split[app]
python -m time_split app start

in the terminal. You may use create_explorer_link() to build application URLs with preselected splitting parameters.

Documentation

Click here for documentation of the most important types, functions and classes used by the application.

Custom dataset loaders

Dataset loaders are a flexible way to load or create datasets that requires user input. The existing images (>=0.7.0) can be extended to use custom loaders:

FROM python:3.13

RUN pip install --no-cache --compile time-split[app]
RUN pip install --no-cache --compile your-dependencies

ENV DATASET_LOADER=custom_dataset_loader:CustomDatasetLoader
COPY custom_dataset_loader.py .

# Entrypoint etc.

Loaders must implement the DataLoaderWidget interface. You may use

python -m time_split app new

to create a template project to get you started.

Custom datasets

To bundle datasets, specify a configuration file (e.g. DATASETS_CONFIG_PATH='s3://my-bucket/data/datasets.toml') with the following keys:

Key Type Required Description
label string Name shown in the UI. Defaults to section header (i.e. "my-dataset" below).
path string Required First argument to the pandas read function.
index string Required Datetime-like column. Will be converted using pandas.to_datetime().
aggregations dict[str, str] Determines function to use in the 📈 Aggregations per fold tab.
description string Markdown. The first line will be used as the summary in the UI.
read_function_kwargs dict[str, Any] Keyword arguments for the pandas read function used.

ℹ️ The read function is chosen automatically based on the path.

ℹ️ Additional dependencies are required for remote filesystems. You may use EXTRA_PIP_PACKAGES=s3fs to install dependencies for the S3 paths used below.

See the DatasetConfig class for internal representation.

[my-dataset]
label = "Dataset name"
path = "s3://my-bucket/data/title_basics.csv"
index = "from"
aggregations = { runtimeMinutes = "min", isAdult = "mean" }
description = """This is the summary.

Simplified version of the
[Title basics](https://developer.imdb.com/non-commercial-datasets/#titlebasicstsvgz) IMDB
dataset. The description supports Markdown syntax.

Last updated: `2019-05-11T20:30:00+00:00'
"""
[my-dataset.read_function_kwargs]
# Valid options depend on the read function used (pandas.read_csv, in this case).

Multiple datasets may be configured in their own top-level sections. Labels must be unique.

Updating datasets

Datasets may be updated while the app is running. This is best done by changing the datasets config TOML file (e.g. by) writing a timestamp, as above.

Default timings:

  • The dataframes returned by the dataset loader are cached for config.DATASET_CACHE_TTL seconds (default = 12 hours).
  • The dataset configuration file is read every config.DATASET_CONFIG_CACHE_TTL seconds (default = 30 seconds).

All datasets are reloaded immediately if the configuration changes, ignoring comments and formatting.

Environment variables

See config.py for configurable values.

User choice

Users may lower some configured values by using the Performance tweaker widget in the ❔ About tab of application. To set a lower default, add a DEFAULT_-prefix to the regular name.

PLOT_AGGREGATIONS_PER_FOLD=true
DEFAULT_PLOT_AGGREGATIONS_PER_FOLD=false

This will disable the (expensive) per-column fold aggregation figures, but users who need them can turn them back on.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

time_split_app-1.1.0.tar.gz (48.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

time_split_app-1.1.0-py3-none-any.whl (65.9 kB view details)

Uploaded Python 3

File details

Details for the file time_split_app-1.1.0.tar.gz.

File metadata

  • Download URL: time_split_app-1.1.0.tar.gz
  • Upload date:
  • Size: 48.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for time_split_app-1.1.0.tar.gz
Algorithm Hash digest
SHA256 7c246263110be3ce1bef27944d93e30e71e4e8b259d02e230689a846b86d23cc
MD5 09040f2008d20e4d5b4755c0dbbd3bed
BLAKE2b-256 4e7dd81efcd9bc51a5b05865883e0c9d7e1d57ee0a9c801286454a42e8642e86

See more details on using hashes here.

Provenance

The following attestation bundles were made for time_split_app-1.1.0.tar.gz:

Publisher: release.yml on rsundqvist/time-split-app

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file time_split_app-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: time_split_app-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 65.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for time_split_app-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 661e58de7b3f294ba7c0a63214aff3bd257db23e3620e79326f7dfd6b89b6d04
MD5 415dfec6c239b5919377c2d1ff4c9bc5
BLAKE2b-256 14b0f2390386d7c5c9c9394de5fe2f947c874da446c2d5f26c6adb013a2a8233

See more details on using hashes here.

Provenance

The following attestation bundles were made for time_split_app-1.1.0-py3-none-any.whl:

Publisher: release.yml on rsundqvist/time-split-app

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page