Skip to main content

Dataset preparation library with Python bindings for sliding window tensors

Project description

Rindle for Python

Rindle turns collections of per-ticker CSV files into contiguous sliding-window tensors that are ready for deep learning workflows. The Python extension wraps the C++20 data preparation engine behind a small, NumPy-friendly API so you can configure builds, materialize datasets, and recover fitted scalers directly from notebooks or training scripts.

Highlights

  • Deterministic dataset builds – declare the window geometry, scaler, and input schema with rindle.create_config and let the engine emit consistent results across runs.
  • Multi-threaded – both build_dataset and get_dataset parallelize work across tickers and windows using a C++ thread pool. An optional thread_count parameter (default 0 = auto) gives explicit control. The Python GIL is released during these calls so other threads stay responsive.
  • Manifest-driven reloads – rehydrate tensors on demand with rindle.get_dataset using the in-memory manifest returned by a build or a saved manifest.json file.
  • NumPy integration – feature (Dataset.X) and target (Dataset.Y) tensors are exposed as NumPy arrays with shape (windows, sequence_length, features) and float32 precision for direct use with frameworks such as PyTorch or TensorFlow.
  • Scaler introspection – fetch the fitted scaler for any ticker/feature pair to invert predictions or understand the normalization that was applied.

Installation

The package ships with pre-built wheels when possible and can also be compiled locally with a C++20 toolchain.

pip install rindle

Building from source requires a compiler with C++20 support, CMake 3.18+, and Python 3.9 or newer. When working from a clone of the repository:

python -m pip install --upgrade pip
python -m pip install build
python -m build
python -m pip install dist/rindle-*.whl

Quickstart

from pathlib import Path
import rindle

config = rindle.create_config(
    input_dir=Path("data/raw_prices"),
    output_dir=Path("data/processed"),
    feature_columns=["Open", "High", "Low", "Close", "Volume"],
    seq_length=64,
    future_horizon=8,
    target_column="Close",
    time_mode=rindle.TimeMode.UTC_NS,
    row_major=False,
    scaler_kind=rindle.ScalerKind.Standard,
)

manifest = rindle.build_dataset(config)  # parallelized across tickers

# Load full dataset (default)
dataset = rindle.get_dataset(manifest)  # parallelized tensor fill

# Load a random 10% sample (maintains ticker distribution)
dataset_small = rindle.get_dataset(manifest, percentage=0.1)

# Explicit thread count (0 = auto-detect)
dataset = rindle.get_dataset(manifest, thread_count=4)

X = dataset.X  # NumPy array: (windows, seq_length, n_features), dtype=float32
Y = dataset.Y  # NumPy array aligned with X when targets are enabled
meta = dataset.meta  # List of WindowMeta objects with ticker provenance
print("total windows:", dataset.n_windows())

The manifest stores the configuration, aggregate statistics, and ticker-level metadata. A copy is written to <output_dir>/manifest.json during the build so you can reload tensors later without repeating the pipeline:

from pathlib import Path

manifest_path = Path(config.output_dir) / "manifest.json"
reloaded = rindle.get_dataset(manifest_path)

Inspecting manifests and scalers

Each ManifestContent instance exposes the fields captured during the build, including feature_columns, total_windows, and ticker_stats. The helper method find_stats("AAPL") returns the TickerStats record for a ticker, and build_ticker_index() can be called if you mutate ticker_stats manually.

To invert normalized values or apply identical scaling elsewhere:

scaler = rindle.get_feature_scaler(manifest, ticker="AAPL", feature="Close")
original_value = rindle.inverse_transform_value(scaler, value=0.42)

The returned FittedScaler exposes transform and inverse_transform methods as well as a params property that includes summary statistics (mean, standard deviation, quartiles, and min/max bounds).

Data layout

  • Dataset.X and Dataset.Y are three-dimensional NumPy arrays backed by the underlying C++ tensors (float32). When row_major=False (the default), the layout is [window][time][feature] with contiguous storage, making it ideal for training recurrent and convolutional models.
  • Dataset.meta is a list of WindowMeta objects describing where each window originated. Fields include ticker, start_row, end_row, and optional target_start / target_end indices.

API reference snapshot

Function Description
rindle.create_config(...) Validate paths, choose feature columns, configure window geometry and scaling. Returns a DatasetConfig.
rindle.build_dataset(config, thread_count=0) Run discovery → scaling → windowing in parallel and return a ManifestContent.
rindle.get_dataset(manifest_or_path, percentage=1.0, thread_count=0) Load feature/target tensors in parallel. Optional percentage (0.0 < p <= 1.0) loads a random subset of windows per ticker.
rindle.get_feature_scaler(manifest_or_path, ticker, feature) Retrieve the fitted scaler for a ticker/feature pair to apply or invert scaling.
rindle.inverse_transform_value(scaler, value) Convenience helper to undo scaling with a FittedScaler.

Additional classes such as DatasetConfig, ManifestContent, Dataset, and TickerStats expose their fields as Python attributes for straightforward inspection or serialization.

Project resources

Although the core engine is implemented in C++, the Python package provides a self-contained workflow for assembling time-series datasets without leaving the Python ecosystem.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rindle-1.0.0.tar.gz (596.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rindle-1.0.0-cp39-cp39-macosx_15_0_arm64.whl (241.5 kB view details)

Uploaded CPython 3.9macOS 15.0+ ARM64

File details

Details for the file rindle-1.0.0.tar.gz.

File metadata

  • Download URL: rindle-1.0.0.tar.gz
  • Upload date:
  • Size: 596.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for rindle-1.0.0.tar.gz
Algorithm Hash digest
SHA256 b96a78cc348125595d4d5b1abb4ef108db22a98f21016b2d477acc38c14c57d9
MD5 af032528475ef2b573d2944dda705a53
BLAKE2b-256 61a181630d394ee1c0a18d5ea7735ce48f9e07ca523757ed309666a477deaba2

See more details on using hashes here.

File details

Details for the file rindle-1.0.0-cp39-cp39-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for rindle-1.0.0-cp39-cp39-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 a60370a0d8ea7f41bae1d9734018e9c8be1e6933d5034458b63aa75d7d22ea22
MD5 6fca215f89df73996c9048a8b4c1fe85
BLAKE2b-256 25bd33cb3c1d7f69f88f814c222bf5cc7e64ce4dcc445ee65963e38ff5e5e6dc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page