Companion app for the `time-split` library.
Project description
Time Split
Time-based k-fold validation splits for heterogeneous data.
Folds plotted on a two-by-two grid. See the examples page for more.
About this image
The Time Split application (available here) is designed to help evaluate the effects of different parameters. To start it locally, run
docker run -p 8501:8501 rsundqvist/time-split
or
pip install time-split[app]
python -m time_split app start
in the terminal. You may use
create_explorer_link()
to build application URLs with preselected splitting parameters.
Custom dataset loaders
Dataset loaders are a flexible way to load or create datasets that requires user input. The existing images (>=0.7.0)
can be extended to use custom loaders:
FROM python:3.13
RUN pip install --no-cache --compile time-split[app]
RUN pip install --no-cache --compile your-dependencies
ENV DATASET_LOADER=custom_dataset_loader:CustomDatasetLoader
COPY custom_dataset_loader.py .
# Entrypoint etc.
Loaders must implement the DataLoaderWidget interface.
Custom datasets
To bundle datasets, mount a configuration file (determined by
DATASETS_CONFIG_PATH='/home/streamlit/datasets.toml'
). The DatasetConfig struct has the following keys:
| Key | Type | Required | Description |
|---|---|---|---|
label |
string |
Name shown in the UI. Defaults to section header (i.e. "my-dataset" below). | |
path |
string |
Required | First argument to the pandas read function. |
index |
string |
Required | Datetime-like column. Will be converted using pandas.to_datetime(). |
aggregations |
dict[str, str] |
Determines function to use in the 📈 Aggregations per fold tab. |
|
description |
string |
Markdown. The first line will be used as the summary in the UI. | |
read_function_kwargs |
dict[str, Any] |
Keyword arguments for the pandas read function used. |
The read function is chosen automatically based on the path.
ℹ️ Additional dependencies are required for remote filesystems. You may use
EXTRA_PIP_PACKAGES=s3fsto install dependencies for the S3 paths used below.
[my-dataset]
label = "Dataset name"
path = "s3://my-bucket/data/title_basics.csv"
index = "from"
aggregations = { runtimeMinutes = "min", isAdult = "mean" }
description = """This is the summary.
Simplified version of the
[Title basics](https://developer.imdb.com/non-commercial-datasets/#titlebasicstsvgz) IMDB
dataset. The description supports Markdown syntax.
Last updated: `2019-05-11T20:30:00+00:00'
"""
[my-dataset.read_function_kwargs]
# Valid options depend on the read function used (pandas.read_csv, in this case).
Multiple datasets may be configured in their own top-level sections. Labels must be unique.
Mounted datasets
A convenient way to keep datasets up-to-date without relying on network storage is to mount a dataset folder on a local machine, using e.g. a CRON job to update the data. To start the image with datasets mounted, run:
docker run \
-p 8501:8501 \
-v ./data:/home/streamlit/data:ro \
-v ./datasets.toml:/home/streamlit/datasets.toml:ro \
-e REQUIRE_DATASETS=true \
rsundqvist/time-split
in the terminal. The tomli-w package may be used to emit TOML files if using Python.
- The dataframes returned by the dataset loader are cached for
config.DATASET_CACHE_TTLseconds (default = 12 hours). - The dataset configuration file is read every
config.DATASET_CONFIG_CACHE_TTLseconds (default = 30 seconds).
All datasets are reloaded immediately if the configuration changes, ignoring comments and formatting.
Environment variables
See config.py for configurable values. Use true|false for boolean variables.
Documentation for the underlying framework (Streamlit) is available
here.
User choice
Users may lower some configured values by using the Performance tweaker widget in the ❔ About tab of application. To
set a lower default, add a DEFAULT_-prefix to the regular name.
PLOT_AGGREGATIONS_PER_FOLD=true
DEFAULT_PLOT_AGGREGATIONS_PER_FOLD=false
This will disable the (expensive) per-column fold aggregation figures, but users who need them can turn them back on.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file time_split_app-0.7.1.tar.gz.
File metadata
- Download URL: time_split_app-0.7.1.tar.gz
- Upload date:
- Size: 46.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
97dd360503d76645588e7916711632040f4038c1dc0408c10f26118312f1b0e8
|
|
| MD5 |
8f9440fc8dffbd2fe98cc709b621cdc7
|
|
| BLAKE2b-256 |
508109f851f1d9a508d1fe2341434375008c7f601bf455f4321362565ffdb3af
|
File details
Details for the file time_split_app-0.7.1-py3-none-any.whl.
File metadata
- Download URL: time_split_app-0.7.1-py3-none-any.whl
- Upload date:
- Size: 62.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
021d88a51cb3c1f8ecf0ff6d20a9b840ebd9b8917738dcbf6c214b22409a6934
|
|
| MD5 |
be26ff4d33d7e4141ea9020b77d21856
|
|
| BLAKE2b-256 |
df3a1c85b0c72019356607affc57c2e5e1bf923c5ba179f68e1334ede30ab4e0
|