Skip to main content

STREAM (Sequential Tasks Review to Evaluate Artificial Memory) is a dataset of 12 diverse sequential tasks to assess neural networks’ memory. Scalable in complexity and sequence length, it covers pattern completion, copy tasks, forecasting, bracket matching, and sorting—ideal for comparing architectures on memory retention and sequential reasoning.

Project description

Stream Dataset

A comprehensive dataset suite for evaluating sequence modeling capabilities of neural networks, particularly focusing on memory, long-term dependencies, and temporal reasoning tasks.

🚀 Features

  • 12 Diverse Tasks: From simple memory tests to complex pattern recognition
  • Multiple Difficulty Levels: Small, medium and large configurations. Advice : if you're building an architecture, you should begin with small.
  • Unified Interface: Consistent API across all tasks with standardized evaluation metrics
  • Ready-to-Use: Pre-configured datasets with train/validation/test splits
  • Flexible: Support for both classification/multi-classification and regression tasks

📦 Installation

pip install stream-dataset

Or install from source:

git clone https://github.com/Naowak/stream-dataset.git
cd stream-dataset
pip install -e .

🎯 Quick Start

import stream_dataset as sd

# Build a task
task_data = sd.build_task('simple_copy', difficulty='small', seed=0)

# Access the data
X_train = task_data['X_train']  # Training inputs
Y_train = task_data['Y_train']  # Training targets
T_train = task_data['T_train']  # Prediction timesteps

# Train your model (example with dummy predictions)
Y_pred = your_model.predict(X_train)

# Evaluate performance
score = sd.compute_score(
    Y=Y_train, 
    Y_hat=Y_pred, 
    prediction_timesteps=T_train,
    category=task_data['category']
)
print(f"Score: {score}")

📚 Available Tasks

Memory Tests

Postcasting Tasks

  • discrete_postcasting: Copy discrete sequences after a delay

    discrete_postcasting

  • continuous_postcasting: Copy continuous sequences after a delay

    continuous_postcasting

Signal Processing

  • sinus_forecasting: Predict frequency-modulated sinusoidal signals

    sinus_forecasting

  • chaotic_forecasting: Forecast Lorenz system dynamics

    chaotic_forecasting

Long-term Dependencies

  • discrete_pattern_completion: Complete masked repetitive patterns (discrete)

    discrete_pattern_completion

  • continuous_pattern_completion: Complete masked repetitive patterns (continuous)

    continuous_pattern_completion

  • simple_copy: Memorize and reproduce sequences after delay + trigger

    simple_copy

  • selective_copy: Memorize only marked elements and reproduce them

    selective_copy

Information Manipulation

  • adding_problem: Add numbers at marked positions

    adding_problem

  • sorting_problem: Sort sequences according to given positions

    sorting_problem

  • bracket_matching: Validate parentheses sequences

    bracket_matching

  • cross_situation: Classify objects based on multiple attributes (color, shape, position)

    cross_situation

🔧 Task Configuration

Each task supports three difficulty levels (the following numbers may vary in function of the task):

Small

  • Reduced sequence lengths and sample counts
  • Suitable for quick experiments and debugging
  • Example: 100 training samples, sequences of ~50-100 timesteps

Medium

  • Realistic problem sizes
  • Suitable for thorough model evaluation
  • Example: 1,000 training samples, sequences of ~100-200 timesteps

Large

  • Big Data configurations
  • Suitable for high-performance models
  • Example: 10,000 training samples, sequences of ~200-500 timesteps

For the tasks that allow it, the difficulty is also adjusted by the dimensions of input & output.

# Small configuration (fast)
task_small = sd.build_task('bracket_matching', difficulty='small', seed=0)

# Medium configuration (thorough)
task_medium = sd.build_task('bracket_matching', difficulty='medium', seed=0)

# Large configuration (comprehensive)
task_large = sd.build_task('bracket_matching', difficulty='large', seed=0)

📊 Data Format

All tasks return a standardized dictionary:

{
    'X_train': np.ndarray,      # Training inputs [batch, time, features]
    'Y_train': np.ndarray,      # Training targets [batch, time, outputs]
    'T_train': np.ndarray,      # Training prediction timesteps [batch, n_predictions]
    'X_valid': np.ndarray,      # Validation inputs
    'Y_valid': np.ndarray,      # Validation targets  
    'T_valid': np.ndarray,      # Validation prediction timesteps
    'X_test': np.ndarray,       # Test inputs
    'Y_test': np.ndarray,       # Test targets
    'T_test': np.ndarray,       # Test prediction timesteps
    'category': str      # 'classification', 'multi_classification', or 'regression'
}

🎨 Example: Complete Evaluation Pipeline

import stream_dataset as sd
import numpy as np
from MyModel import MyModel

def evaluate_model_on_all_tasks(model, difficulty='small'):
    """Evaluate a model on all available tasks."""
    
    results = {}
    task_names = [
        'simple_copy', 'selective_copy', 'adding_problem',
        'discrete_postcasting', 'continuous_postcasting',
        'discrete_pattern_completion', 'continuous_pattern_completion',
        'bracket_matching', 'sorting_problem', 'cross_situation',
        'sinus_forecasting', 'chaotic_forecasting'
    ]
    
    for task_name in task_names:
        print(f"Evaluating on {task_name}...")
        
        # Load task
        task_data = sd.build_task(task_name, difficulty=difficulty)
        
        # Train model (simplified)
        model = MyModel(...)
        model.train(task_data['X_train'], task_data['Y_train'], task_data['T_train'])
        # If you want your model to learn all timesteps, including the ones that are not evaluated :
        # Comment the previous line and uncomment the following one
        # model.train(task_data['X_train'], task_data['Y_train'])

        # Predict on test set
        Y_pred = model.predict(task_data['X_test'])
        
        # Compute score
        score = sd.compute_score(
            Y=task_data['Y_test'],
            Y_hat=Y_pred,
            prediction_timesteps=task_data['T_test'],
            category=task_data['category']
        )
        
        results[task_name] = score
        print(f"  Score: {score:.4f}")
    
    return results

# Usage
# results = evaluate_model_on_all_tasks(your_model, difficulty='medium')

📈 Evaluation Metrics

  • Classification tasks: Error rate (1 - accuracy)
  • Regression tasks: Mean Squared Error (MSE)
  • Multi-class classification tasks: Exact match accuracy (1 - exact match accuracy)

Lower scores indicate better performance for both metrics.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📚 Citation

If you use Stream Dataset in your research, please cite:

(paper in progress...)

@software{stream_dataset,
  title={Stream Dataset: Sequential Task Review to Evaluate Artificial Memory},
  author={Yannis Bendi-Ouis, Xavier Hinaut},
  year={2025},
  url={https://github.com/Naowak/stream-dataset}
}

🙏 Acknowledgments

  • Inspired by classic sequence modeling datasets
  • Built with NumPy and Hugging Face Datasets

📞 Support


Stream Dataset - Advancing sequence modeling evaluation, one task at a time.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stream_dataset-0.1.3.tar.gz (678.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stream_dataset-0.1.3-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file stream_dataset-0.1.3.tar.gz.

File metadata

  • Download URL: stream_dataset-0.1.3.tar.gz
  • Upload date:
  • Size: 678.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for stream_dataset-0.1.3.tar.gz
Algorithm Hash digest
SHA256 8de3cba1df436539cd5e0c2c4408d032658b44aa877864a1b9913c3a039b8960
MD5 1ba7d5cb9e9f26d09f1ef033a66ee922
BLAKE2b-256 18370c2c628ef01688b8abcb2ed4c7ba635e19cba2a92c427d497ada52b94c7c

See more details on using hashes here.

File details

Details for the file stream_dataset-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: stream_dataset-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 15.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for stream_dataset-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f833424878aa98244cf89a17f76abd637a27224412982b461fb13c773ecf4140
MD5 a6b234b4fb1247de7d677367c5fdf242
BLAKE2b-256 16e638a88ad67e3ad3b1d2258fa5377d96246605e320ecc1c50650329747b9e9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page