A powerful library for creating time series datasets for machine learning models

These details have not been verified by PyPI

Project description

TSDC - Time Series Dataset Creator

A powerful and intuitive Python library for creating time series datasets ready for machine learning models like LSTM, GRU, and Transformers. No more manual data preprocessing - just load your data and start training!

Why TSDC?

When working with time series models (especially LSTM), you always need to:

Convert raw data into sliding window sequences
Split data temporally (not randomly!)
Normalize/scale features properly
Handle multivariate inputs with single target outputs
Create proper shapes for neural networks

TSDC automates all of this in just a few lines of code.

Installation
Quick Start
Core Concepts
API Reference
Examples
Advanced Usage
Development
Contributing

Installation

From source (recommended for development)

git clone https://github.com/DeepPythonist/tsdc.git
cd tsdc
pip install -e .

For additional features

pip install -e ".[examples]"

This includes yfinance for financial data loading and matplotlib for visualization.

Quick Start

Basic Example: Single Variable

import numpy as np
from tsdc import TimeSeriesDataset

bitcoin_prices = np.random.randn(1000) * 1000 + 40000

dataset = TimeSeriesDataset(
    data=bitcoin_prices,
    lookback=60,
    horizon=1
)
dataset.prepare()

X_train, y_train = dataset.get_train()
X_val, y_val = dataset.get_val()
X_test, y_test = dataset.get_test()

print(f"X_train shape: {X_train.shape}")  # (samples, 60, 1)
print(f"y_train shape: {y_train.shape}")  # (samples, 1)

Multivariate Example with Target Column

import pandas as pd
from tsdc import TimeSeriesDataset

data = pd.DataFrame({
    'temperature': [...],
    'humidity': [...],
    'pressure': [...]
})

dataset = TimeSeriesDataset(
    data=data,
    lookback=24,
    horizon=6,
    target_column='temperature',
    scaler_type='minmax'
)
dataset.prepare()

X_train, y_train = dataset.get_train()

Core Concepts

1. Lookback and Horizon

lookback: Number of past timesteps to use as input
horizon: Number of future timesteps to predict

lookback=60, horizon=1   # Use 60 past points to predict next 1 point
lookback=24, horizon=12  # Use 24 hours to predict next 12 hours

2. Stride

Control how windows overlap:

stride=1   # Maximum overlap, windows shift by 1 timestep
stride=5   # Less overlap, windows shift by 5 timesteps

3. Train/Val/Test Splits

IMPORTANT: TSDC uses temporal (sequential) splitting, NOT random splitting!

Time series splitting preserves temporal order to prevent data leakage:

TimeSeriesDataset(
    data=data,
    train_split=0.7,   # First 70% for training (oldest data)
    val_split=0.15,    # Next 15% for validation (middle data)
    test_split=0.15    # Last 15% for testing (newest data)
)

# Train ← Val ← Test (sequential, no shuffling)
# This prevents training on future data and testing on past data!

4. Scaling Options

scaler_type='minmax'    # Scale to [0, 1]
scaler_type='standard'  # Zero mean, unit variance
scaler_type='robust'    # Robust to outliers
scaler_type='none'      # No scaling

API Reference

TimeSeriesDataset

Main class for dataset creation.

TimeSeriesDataset(
    data: Union[np.ndarray, pd.DataFrame, pd.Series, str],
    lookback: int = 10,
    horizon: int = 1,
    stride: int = 1,
    target_column: Optional[Union[int, str]] = None,
    scaler_type: str = "minmax",
    train_split: float = 0.7,
    val_split: float = 0.15,
    test_split: float = 0.15
)

Methods:

prepare(preprocess=True): Prepare the dataset
get_train(): Returns (X_train, y_train)
get_val(): Returns (X_val, y_val)
get_test(): Returns (X_test, y_test)
get_all(): Returns dictionary with all splits
get_info(): Get dataset information
inverse_transform_predictions(predictions): Convert scaled predictions back

Sequencer

Low-level API for creating sequences.

from tsdc import Sequencer

sequencer = Sequencer(lookback=10, horizon=5, stride=1)
X, y = sequencer.create_sequences(data)

Preprocessor

Standalone preprocessing utilities.

from tsdc import Preprocessor

preprocessor = Preprocessor(
    scaler_type='minmax',
    handle_missing='forward_fill',
    remove_outliers=True,
    outlier_threshold=3.0
)
scaled_data = preprocessor.fit_transform(data)
original_data = preprocessor.inverse_transform(scaled_data)

FinancialLoader

Load financial data from Yahoo Finance.

from tsdc.loaders import FinancialLoader

loader = FinancialLoader()
btc_data = loader.load(
    symbol="BTC-USD",
    start_date="2023-01-01",
    end_date="2024-01-01",
    source="yahoo"
)

btc_data = loader.add_technical_indicators(
    sma_periods=[20, 50],
    ema_periods=[12, 26],
    rsi_period=14,
    macd=True
)

Examples

Example 1: Bitcoin Price Prediction with LSTM

import numpy as np
from tsdc import TimeSeriesDataset
from tsdc.loaders import FinancialLoader

loader = FinancialLoader()
btc_data = loader.load(symbol="BTC-USD", start_date="2022-01-01")

dataset = TimeSeriesDataset(
    data=btc_data[['Close', 'Volume']],
    lookback=60,
    horizon=1,
    target_column='Close',
    scaler_type='minmax'
)
dataset.prepare()

X_train, y_train = dataset.get_train()

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout

model = Sequential([
    LSTM(100, return_sequences=True, input_shape=(60, 2)),
    Dropout(0.2),
    LSTM(50, return_sequences=False),
    Dropout(0.2),
    Dense(1)
])

model.compile(optimizer='adam', loss='mse')
model.fit(X_train, y_train, epochs=50, batch_size=32)

Example 2: Walk-Forward Validation

from tsdc import TimeSeriesDataset, Sequencer
from tsdc.utils.splitters import walk_forward_validation

data = np.random.randn(1000, 3)
sequencer = Sequencer(lookback=20, horizon=1)
X, y = sequencer.create_sequences(data)

for X_train, y_train, X_test, y_test in walk_forward_validation(X, y, n_splits=5):
    model.fit(X_train, y_train)
    score = model.evaluate(X_test, y_test)
    print(f"Test Score: {score}")

Example 3: Loading from CSV

from tsdc import TimeSeriesDataset

dataset = TimeSeriesDataset(
    data="path/to/data.csv",
    lookback=30,
    horizon=7,
    target_column="sales"
)
dataset.prepare()

Example 4: Custom Preprocessing

from tsdc import Preprocessor

preprocessor = Preprocessor(
    scaler_type='robust',
    handle_missing='interpolate',
    remove_outliers=True,
    outlier_threshold=2.5
)

cleaned_data = preprocessor.fit_transform(raw_data)

Advanced Usage

Multi-step Forecasting

Predict multiple timesteps ahead:

dataset = TimeSeriesDataset(
    data=data,
    lookback=48,
    horizon=24,
    target_column='price'
)
dataset.prepare()
X_train, y_train = dataset.get_train()

Custom Splits with Indices

from tsdc.utils.splitters import expanding_window_split

for X_train, y_train, X_test, y_test in expanding_window_split(
    X, y, 
    initial_train_size=100,
    test_size=20,
    step=10
):
    pass

Inverse Transform Predictions

predictions = model.predict(X_test)
original_scale = dataset.inverse_transform_predictions(predictions)

Project Structure

tsdc/
├── tsdc/
│   ├── __init__.py
│   ├── core/
│   │   ├── dataset.py       # Main TimeSeriesDataset class
│   │   ├── sequencer.py     # Sliding window operations
│   │   └── preprocessor.py  # Data preprocessing
│   ├── loaders/
│   │   ├── base.py         # Base loader class
│   │   └── financial.py    # Financial data loaders
│   └── utils/
│       ├── validators.py   # Input validation
│       └── splitters.py    # Time series splitting
├── examples/
│   ├── basic_usage.py      # Basic examples
│   ├── lstm_bitcoin.py     # Bitcoin prediction
│   └── quick_start.py      # Quick start guide
├── tests/
│   └── test_core.py        # Unit tests
├── setup.py
├── requirements.txt
└── README.md

Development

Running Tests

pytest tests/ -v

Running Examples

python examples/basic_usage.py
python examples/lstm_bitcoin.py

Code Style

This project follows PEP 8 guidelines. Format your code with:

black tsdc/
flake8 tsdc/

Features

✅ Easy sequence creation for LSTM/GRU/Transformer models
✅ Built-in preprocessing and normalization
✅ Proper train/validation/test splitting for time series
✅ Support for univariate and multivariate data
✅ Target column selection for multivariate inputs
✅ Financial data loaders with technical indicators
✅ Walk-forward and expanding window validation
✅ Flexible sliding window operations
✅ Missing value handling
✅ Outlier detection and removal
✅ Inverse transform for predictions
✅ Multiple scaling methods

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use TSDC in your research, please cite:

@software{tsdc2024,
  title={TSDC: Time Series Dataset Creator},
  author={DeepPythonist},
  year={2024},
  url={https://github.com/DeepPythonist/tsdc}
}

Support

For issues and questions:

Open an issue on GitHub Issues
Check the examples/ directory for usage examples

Roadmap

Add more data loaders (crypto, weather, etc.)
Add data augmentation techniques
Support for irregular time series
Integration with PyTorch DataLoader
Built-in visualization tools
Automated hyperparameter tuning for lookback/horizon

Made with ❤️ for the ML community

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Oct 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tsdc-0.1.0.tar.gz (39.3 kB view details)

Uploaded Oct 3, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tsdc-0.1.0-py3-none-any.whl (25.2 kB view details)

Uploaded Oct 3, 2025 Python 3

File details

Details for the file tsdc-0.1.0.tar.gz.

File metadata

Download URL: tsdc-0.1.0.tar.gz
Upload date: Oct 3, 2025
Size: 39.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tsdc-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`2c63aa435e9d1550022e12276c735963af473317134a166b9ce18576b06a15d6`
MD5	`4f835a50e61b221985fc9c2895a9c3af`
BLAKE2b-256	`76a0dba9a0ea20fcab2b966dc38665aacbd5efa289725e814e7f694e2550d0a2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tsdc-0.1.0.tar.gz:

Publisher: publish.yml on DeepPythonist/tsdc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tsdc-0.1.0.tar.gz
- Subject digest: 2c63aa435e9d1550022e12276c735963af473317134a166b9ce18576b06a15d6
- Sigstore transparency entry: 583049398
- Sigstore integration time: Oct 3, 2025
Source repository:
- Permalink: DeepPythonist/tsdc@13b81126a0c66fe3ef6f97ead31d87a9e5ee7415
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/DeepPythonist
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@13b81126a0c66fe3ef6f97ead31d87a9e5ee7415
- Trigger Event: release

File details

Details for the file tsdc-0.1.0-py3-none-any.whl.

File metadata

Download URL: tsdc-0.1.0-py3-none-any.whl
Upload date: Oct 3, 2025
Size: 25.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tsdc-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`37f773ece2fc9ea0b71f9c9be85a337cd19e5b67274545acee1665f5f7dd039a`
MD5	`98519a39a78e1423e32a5d79e3416499`
BLAKE2b-256	`c5be04cecba800f171a33f8d92b3184a3eec17e68782322cd78160fd43262328`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tsdc-0.1.0-py3-none-any.whl:

Publisher: publish.yml on DeepPythonist/tsdc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tsdc-0.1.0-py3-none-any.whl
- Subject digest: 37f773ece2fc9ea0b71f9c9be85a337cd19e5b67274545acee1665f5f7dd039a
- Sigstore transparency entry: 583049402
- Sigstore integration time: Oct 3, 2025
Source repository:
- Permalink: DeepPythonist/tsdc@13b81126a0c66fe3ef6f97ead31d87a9e5ee7415
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/DeepPythonist
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@13b81126a0c66fe3ef6f97ead31d87a9e5ee7415
- Trigger Event: release

tsdc 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

TSDC - Time Series Dataset Creator

Why TSDC?

Table of Contents

Installation

From source (recommended for development)

For additional features

Quick Start

Basic Example: Single Variable

Multivariate Example with Target Column

Core Concepts

1. Lookback and Horizon

2. Stride

3. Train/Val/Test Splits

4. Scaling Options

API Reference

TimeSeriesDataset

Sequencer

Preprocessor

FinancialLoader

Examples

Example 1: Bitcoin Price Prediction with LSTM

Example 2: Walk-Forward Validation

Example 3: Loading from CSV

Example 4: Custom Preprocessing

Advanced Usage

Multi-step Forecasting

Custom Splits with Indices

Inverse Transform Predictions

Project Structure

Development

Running Tests

Running Examples

Code Style

Features

Contributing

License

Citation

Support

Roadmap

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance