Skip to main content

STREAM (Sequential Tasks Review to Evaluate Artificial Memory) is a dataset of 12 diverse sequential tasks to assess neural networks’ memory. Scalable in complexity and sequence length, it covers pattern completion, copy tasks, forecasting, bracket matching, and sorting—ideal for comparing architectures on memory retention and sequential reasoning.

Project description

Stream Dataset

A comprehensive dataset suite for evaluating sequence modeling capabilities of neural networks, particularly focusing on memory, long-term dependencies, and temporal reasoning tasks.

🚀 Features

  • 12 Diverse Tasks: From simple memory tests to complex pattern recognition
  • Multiple Difficulty Levels: Small, medium and large configurations. Advice : if you're building an architecture, you should begin with small.
  • Unified Interface: Consistent API across all tasks with standardized evaluation metrics
  • Ready-to-Use: Pre-configured datasets with train/validation/test splits
  • Flexible: Support for both classification and regression tasks

📦 Installation

pip install stream-dataset

Or install from source:

git clone https://github.com/Naowak/stream-dataset.git
cd stream-dataset
pip install -e .

🎯 Quick Start

import stream_dataset as sd

# Build a task
task_data = sd.build_task('simple_copy', difficulty='small', seed=0)

# Access the data
X_train = task_data['X_train']  # Training inputs
Y_train = task_data['Y_train']  # Training targets
T_train = task_data['T_train']  # Prediction timesteps

# Train your model (example with dummy predictions)
Y_pred = your_model.predict(X_train)

# Evaluate performance
score = sd.compute_score(
    Y=Y_train, 
    Y_hat=Y_pred, 
    prediction_timesteps=T_train,
    classification=task_data['classification']
)
print(f"Score: {score}")

📚 Available Tasks

Memory Tests

Postcasting Tasks

  • discrete_postcasting: Copy discrete sequences after a delay

    discrete_postcasting

  • continuous_postcasting: Copy continuous sequences after a delay

    continuous_postcasting

Signal Processing

  • sinus_forecasting: Predict frequency-modulated sinusoidal signals

    sinus_forecasting

  • chaotic_forecasting: Forecast Lorenz system dynamics

    chaotic_forecasting

Long-term Dependencies

  • discrete_pattern_completion: Complete masked repetitive patterns (discrete)

    discrete_pattern_completion

  • continuous_pattern_completion: Complete masked repetitive patterns (continuous)

    continuous_pattern_completion

  • simple_copy: Memorize and reproduce sequences after delay + trigger

    simple_copy

  • selective_copy: Memorize only marked elements and reproduce them

    selective_copy

Information Manipulation

  • adding_problem: Add numbers at marked positions

    adding_problem

  • sorting_problem: Sort sequences according to given positions

    sorting_problem

  • bracket_matching: Validate parentheses sequences

    bracket_matching

  • cross_situation: Classify objects based on multiple attributes (color, shape, position)

    cross_situation

🔧 Task Configuration

Each task supports three difficulty levels (the following numbers may vary in function of the task):

Small

  • Reduced sequence lengths and sample counts
  • Suitable for quick experiments and debugging
  • Example: 100 training samples, sequences of ~50-100 timesteps

Medium

  • Realistic problem sizes
  • Suitable for thorough model evaluation
  • Example: 1,000 training samples, sequences of ~100-200 timesteps

Large

  • Big Data configurations
  • Suitable for high-performance models
  • Example: 10,000 training samples, sequences of ~200-500 timesteps

For the tasks that allow it, the difficulty is also adjusted by the dimensions of input & output.

# Small configuration (fast)
task_small = sd.build_task('bracket_matching', difficulty='small', seed=0)

# Medium configuration (thorough)
task_medium = sd.build_task('bracket_matching', difficulty='medium', seed=0)

# Large configuration (comprehensive)
task_large = sd.build_task('bracket_matching', difficulty='large', seed=0)

📊 Data Format

All tasks return a standardized dictionary:

{
    'X_train': np.ndarray,      # Training inputs [batch, time, features]
    'Y_train': np.ndarray,      # Training targets [batch, time, outputs]
    'T_train': np.ndarray,      # Training prediction timesteps [batch, n_predictions]
    'X_valid': np.ndarray,      # Validation inputs
    'Y_valid': np.ndarray,      # Validation targets  
    'T_valid': np.ndarray,      # Validation prediction timesteps
    'X_test': np.ndarray,       # Test inputs
    'Y_test': np.ndarray,       # Test targets
    'T_test': np.ndarray,       # Test prediction timesteps
    'classification': bool      # True for classification, False for regression : used to select the correspongind loss
}

🎨 Example: Complete Evaluation Pipeline

import stream_dataset as sd
import numpy as np
from MyModel import MyModel

def evaluate_model_on_all_tasks(model, difficulty='small'):
    """Evaluate a model on all available tasks."""
    
    results = {}
    task_names = [
        'simple_copy', 'selective_copy', 'adding_problem',
        'discrete_postcasting', 'continuous_postcasting',
        'discrete_pattern_completion', 'continuous_pattern_completion',
        'bracket_matching', 'sorting_problem', 'cross_situation',
        'sinus_forecasting', 'chaotic_forecasting'
    ]
    
    for task_name in task_names:
        print(f"Evaluating on {task_name}...")
        
        # Load task
        task_data = sd.build_task(task_name, difficulty=difficulty)
        
        # Train model (simplified)
        model = MyModel(...)
        model.train(task_data['X_train'], task_data['Y_train'], task_data['T_train'])
        # If you want your model to learn all timesteps, including the ones that are not evaluated :
        # Comment the previous line and uncomment the following one
        # model.train(task_data['X_train'], task_data['Y_train'])

        # Predict on test set
        Y_pred = model.predict(task_data['X_test'])
        
        # Compute score
        score = sd.compute_score(
            Y=task_data['Y_test'],
            Y_hat=Y_pred,
            prediction_timesteps=task_data['T_test'],
            classification=task_data['classification']
        )
        
        results[task_name] = score
        print(f"  Score: {score:.4f}")
    
    return results

# Usage
# results = evaluate_model_on_all_tasks(your_model, difficulty='medium')

📈 Evaluation Metrics

  • Classification tasks: Error rate (1 - accuracy)
  • Regression tasks: Mean Squared Error (MSE)

Lower scores indicate better performance for both metrics.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📚 Citation

If you use Stream Dataset in your research, please cite:

(paper in progress...)

@software{stream_dataset,
  title={Stream Dataset: Sequential Task Review to Evaluate Artificial Memory},
  author={Yannis Bendi-Ouis, Xavier Hinaut},
  year={2025},
  url={https://github.com/Naowak/stream-dataset}
}

🙏 Acknowledgments

  • Inspired by classic sequence modeling datasets
  • Built with NumPy and Hugging Face Datasets

📞 Support


Stream Dataset - Advancing sequence modeling evaluation, one task at a time.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stream_dataset-0.1.1.tar.gz (635.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stream_dataset-0.1.1-py3-none-any.whl (15.6 kB view details)

Uploaded Python 3

File details

Details for the file stream_dataset-0.1.1.tar.gz.

File metadata

  • Download URL: stream_dataset-0.1.1.tar.gz
  • Upload date:
  • Size: 635.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for stream_dataset-0.1.1.tar.gz
Algorithm Hash digest
SHA256 2c80ace68ce82d03701aaadd6175f6cde0e84870312aa4fc5c882e7a3af8f537
MD5 70d52bb7fd08936612f8fa69d4479392
BLAKE2b-256 d7267fc5f020e3f34149e13e7866a0a482f188bcf37d8c55f269eebe19af7eb4

See more details on using hashes here.

File details

Details for the file stream_dataset-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: stream_dataset-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 15.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for stream_dataset-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 41440dac4d5b0be4ec0c6a1de0b8d6955f5dd89ed4aeb2b6e0c034c203226cb1
MD5 d5010bebe1fc201cda6a081db07f0bab
BLAKE2b-256 d356b4521c95838dcbee699e57be5b7d22a1ea0fe0a27c8dbcc406a2584f70ed

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page