CogScale: Scalable Benchmark for Sequence Processing is a dataset of 14 diverse sequential tasks to assess neural networks’ memory. Scalable in complexity and sequence length, it covers pattern completion, copy tasks, forecasting, bracket matching, and sorting—ideal for comparing architectures on memory retention and sequential reasoning.

Project description

CogScale: Scalable Benchmark for Sequence Processing

A lightweight, fully synthetic benchmark designed as a cognitive "sanity check" for evaluating the sequence modeling capabilities of neural networks. CogScale isolates specific cognitive and memory abilities to let you rapidly validate architectural innovations before committing to massive, environmentally costly large-scale training.

🚀 Features

14 Diverse Tasks: Signal Forecasting, Memory & Retention, Pattern Recognition, and Manipulation & Reasoning.
Parametrizable Scalability: Test architectures across different difficulty levels (Small, Medium, Large) by scaling sequence lengths, delays, and vocabulary sizes.
Unified Interface: Consistent API across all tasks with standardized evaluation metrics (MSE, Error Rate).
Zero Disk Storage: Data is generated dynamically during training, bypassing loading bottlenecks and preventing simple memorization.
Strict Baseline Evaluation: Evaluate how your model compares against established architectures (Transformers, Mamba, xLSTM, LSTM, GRU, ESN) under strict parameter budgets.

📊 Baseline Performances: The Cognitive Radar

We provide a solid baseline by evaluating 7 distinct architectures under strict parameter budgets (1k, 10k, and 100k). The Cognitive Radar visualizes their peak performances (accuracy) across a selective subset of tasks and scales, demonstrating how attention models (Transfomers) and state space models (Mamba) maintain robust reasoning under increased cognitive loads, while simple reservoir models (ESN) offer striking efficiency for basic retention tasks at an ultra-low parameter scale.

(See the paper for detailed insights on scalability and parameter efficiency).

📦 Installation

pip install cogscale

Or install from source:

git clone [https://github.com/Naowak/cogscale.git](https://github.com/Naowak/cogscale.git)
cd cogscale
pip install -e .

🎯 Quick Start

import cogscale as cog

# Build a task
task_data = cog.build_task('simple_copy', difficulty='small', seed=42)

# Access the dynamically generated data
X_train = task_data['X_train']  # Training inputs
Y_train = task_data['Y_train']  # Training targets
T_train = task_data['T_train']  # Prediction timesteps

# Train your model (example with dummy predictions)
Y_pred = your_model.predict(X_train)

# Evaluate performance using the unified metric
score = cog.compute_score(
    Y=Y_train, 
    Y_hat=Y_pred, 
    prediction_timesteps=T_train,
    category=task_data['category']
)
print(f"Score: {score}")

📚 Available Tasks

1. Signal Processing and Forecasting

`sinus_forecasting`:

Predict the future evolution of a sinusoidal signal.

`chaotic_forecasting`:

Forecast the future state of a three-dimensional chaotic Lorenz system.

2. Memory and Retention

`discrete_postcasting`:

Reproduce a discrete sequence identically after a specified time shift.

`continuous_postcasting`:

Reproduce a continuous sequence identically after a specified time shift.

`simple_copy`:

Read a sequence, hold it in memory during a silent delay, and reproduce it after a trigger token.

`selective_copy`:

Memorize only a specific subset of marked tokens amidst distractions and output them at the end.

`associative_recall`:

Memorize a sequence of key-value pairs and retrieve the correct value when queried with a seen key.

3. Pattern Recognition and Completion

`discrete_pattern_completion`:

Identify and infer missing components within a masked discrete periodic motif.

`continuous_pattern_completion`:

Identify and infer missing components within a masked continuous periodic motif.

`induction_heads`:

Recognize in-context duplicated sequence structures to predict the next token.

4. Reasoning and Algorithmic Manipulation

`adding_problem`:

Compute and output the sum of only the marked numbers within a random sequence.

`sorting_problem`:

Output a randomized sequence sorted into the correct positional order after a trigger.

`bracket_matching`:

Determine if a mutated string of parentheses represents a valid hierarchy.

`cross_situation`:

Infer logical roles and attributes (objects, colors, positions) from a simplified natural language reasoning problem encoded in one-hot vectors.

🔧 Task Configuration

Each task supports three modular difficulty levels, designed to verify if scaling a model's parameters genuinely translates to better cognitive capabilities:

Small (`SM`)

Reduced sequence lengths, delays, and vocabulary sizes.
Ideal for quick experiments, debugging, and testing models at the 1k-10k parameter scale.

Medium (`MD`)

Realistic problem sizes with increased cognitive load.
Suitable for thorough model evaluation and scalability testing at the 10k-100k parameter scale.

Large (`LG`)

Highly complex configurations.
Designed to push high-performance and large-scale architectures to their representational limits.

# Small configuration (fast & lightweight)
task_small = cog.build_task('bracket_matching', difficulty='small')

# Medium configuration (thorough scalability test)
task_medium = cog.build_task('bracket_matching', difficulty='medium')

📊 Data Format

All tasks return a standardized dictionary containing NumPy arrays:

{
    'X_train': np.ndarray,      # Training inputs [batch, time, features]
    'Y_train': np.ndarray,      # Training targets [batch, time, outputs]
    'T_train': np.ndarray,      # Training prediction timesteps [batch, n_predictions]
    'X_valid': np.ndarray,      # Validation inputs
    'Y_valid': np.ndarray,      # Validation targets  
    'T_valid': np.ndarray,      # Validation prediction timesteps
    'X_test': np.ndarray,       # Test inputs
    'Y_test': np.ndarray,       # Test targets
    'T_test': np.ndarray,       # Test prediction timesteps
    'category': str             # 'classification', 'multi_classification', or 'regression'
}

🎨 Example: Complete Evaluation Pipeline

import cogscale as cog
from MyModel import MyModel

def evaluate_model_on_all_tasks(model, difficulty='small'):
    """Evaluate an architecture across the full CogScale cognitive spectrum."""
    
    results = {}
    task_names = [
        'sinus_forecasting', 'chaotic_forecasting',
        'discrete_postcasting', 'continuous_postcasting', 
        'simple_copy', 'selective_copy', 'associative_recall',
        'discrete_pattern_completion', 'continuous_pattern_completion', 'induction_heads',
        'adding_problem', 'sorting_problem', 'bracket_matching', 'cross_situation'
    ]
    
    for task_name in task_names:
        print(f"Evaluating on {task_name}...")
        
        # Load dynamically generated task
        task_data = cog.build_task(task_name, difficulty=difficulty)
        
        # Train model
        model = MyModel(...)
        model.train(task_data['X_train'], task_data['Y_train'], task_data['T_train'])

        # Predict on test set
        Y_pred = model.predict(task_data['X_test'])
        
        # Compute unified score
        score = cog.compute_score(
            Y=task_data['Y_test'],
            Y_hat=Y_pred,
            prediction_timesteps=task_data['T_test'],
            category=task_data['category']
        )
        
        results[task_name] = score
        print(f"  Score: {score:.4f}")
    
    return results

📈 Evaluation Metrics

The evaluation metrics automatically adapt based on the task category:

Regression tasks: Mean Squared Error (MSE)
Classification tasks: Error rate (1 - accuracy)
Multi-class classification tasks: Label-based error rate (1 - label-based accuracy)

Lower scores indicate better performance across all tasks.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📚 Citation

If you use CogScale in your research or find it helpful as a sanity check for your architectures, please cite:

@inproceedings{cogscale2026,
  title={CogScale: Scalable Benchmark for Sequence Processing},
  author={Bendi-Ouis Yannis, De Coudenhove Romain and Hinaut Xavier},
  year={2026},
  url={https://github.com/Naowak/cogscale}
}

📞 Support

🐛 Issues: GitHub Issues
💬 Discussions: GitHub Discussions

CogScale - Democratizing architectural research by ensuring foundational cognitive abilities before massive scaling.

Project details

Release history Release notifications | RSS feed

0.1.1

May 13, 2026

This version

0.1.0

May 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cogscale-0.1.0.tar.gz (3.8 MB view details)

Uploaded May 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cogscale-0.1.0-py3-none-any.whl (17.4 kB view details)

Uploaded May 13, 2026 Python 3

File details

Details for the file cogscale-0.1.0.tar.gz.

File metadata

Download URL: cogscale-0.1.0.tar.gz
Upload date: May 13, 2026
Size: 3.8 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for cogscale-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`400d3d995c55d928d4399478cf63a2f404d18e187570b94d0531c439f9874a32`
MD5	`1aa87e9a45ee653117b833d833d896c1`
BLAKE2b-256	`c4821342cf334fce38dfd0ec3d8afb29f15e6e0fc0061c207ca9ab8e0ea6dd3c`

See more details on using hashes here.

File details

Details for the file cogscale-0.1.0-py3-none-any.whl.

File metadata

Download URL: cogscale-0.1.0-py3-none-any.whl
Upload date: May 13, 2026
Size: 17.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for cogscale-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d37e3cdea19979dfb05e14c3ebc9a9608d7dd38a37fd7cf3ec4242acd9db95bc`
MD5	`de5d0bed09be7edcbfc2112f1f6d092d`
BLAKE2b-256	`613402a02687a85fb3e101a045f924ca374fabfb1b4e3667e62939cf2783ec25`

See more details on using hashes here.

cogscale 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

CogScale: Scalable Benchmark for Sequence Processing

🚀 Features

📊 Baseline Performances: The Cognitive Radar

📦 Installation

🎯 Quick Start

📚 Available Tasks

1. Signal Processing and Forecasting

sinus_forecasting:

chaotic_forecasting:

2. Memory and Retention

discrete_postcasting:

continuous_postcasting:

simple_copy:

selective_copy:

associative_recall:

3. Pattern Recognition and Completion

discrete_pattern_completion:

continuous_pattern_completion:

induction_heads:

4. Reasoning and Algorithmic Manipulation

adding_problem:

sorting_problem:

bracket_matching:

cross_situation:

🔧 Task Configuration

Small (SM)

Medium (MD)

Large (LG)

📊 Data Format

🎨 Example: Complete Evaluation Pipeline

📈 Evaluation Metrics

📄 License

📚 Citation

📞 Support

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`sinus_forecasting`:

`chaotic_forecasting`:

`discrete_postcasting`:

`continuous_postcasting`:

`simple_copy`:

`selective_copy`:

`associative_recall`:

`discrete_pattern_completion`:

`continuous_pattern_completion`:

`induction_heads`:

`adding_problem`:

`sorting_problem`:

`bracket_matching`:

`cross_situation`:

Small (`SM`)

Medium (`MD`)

Large (`LG`)