Echo State Gradient Propogation implementation for PyTorch
Project description
ESGP-Net: Echo State Gated Population Networks for PyTorch
Official PyTorch implementation of ESGP++ (Echo State Gated Population), a novel recurrent architecture that outperforms LSTMs, GRUs, and traditional Echo State Networks on challenging sequential tasks like sequential MNIST.
Overview
ESGP++ combines the efficiency of Echo State Networks with the expressive power of gated recurrent units, delivering state-of-the-art performance on sequential tasks with significantly faster training times than LSTMs or GRUs.
Key Features
-
🚀 State-of-the-art performance on sequential tasks including sequential MNIST
-
⚡ Computationally efficient compared to LSTMs and GRUs
-
🔧 Easy integration with existing PyTorch workflows
-
🧠 Reservoir computing principles with learnable gating mechanisms
-
📈 Spectral radius normalization for stable dynamics
Installation
pip install esgp
Or from source:
git clone https://github.com/RoninAkagami/esgp-net.git
cd esgp-net
pip install -e .
Quick Start
import torch
from esgp import ESGP
# Create an ESGP layer
model = ESGP(
input_size=128,
hidden_size=256,
num_layers=2,
sparsity=0.1,
spectral_radius=0.9,
batch_first=True
)
# Process a sequence
x = torch.randn(32, 10, 128) # (batch, seq, features)
output, hidden = model(x)
print(output.shape) # torch.Size([32, 10, 256])
Performance
ESGP++ demonstrates superior performance on various sequential tasks:
| Model | Sequential MNIST Accuracy(30 Epochs) | Parameters |
|-------|---------------------------|------------|
| LSTM | ~18.86% | 68,362 |
| GRU | ~62.65% | 51,594 |
| ESN | ~12.14% | 1,290 |
| ESGP++ (Ours) | ~75.94 | 18,058 |
| Model | Mackey Glass Chaotic Time Series MAE(30 Epochs) | Parameters |
|-------|---------------------------|------------|
| LSTM | ~0.00141 | 67,201 |
| GRU | ~0.000549 | 50,433 |
| ESN | ~0.001378 | 129 |
| ESGP++ (Ours) | ~0.000363 | 16,897 |
| Model | Copy Task MSE(30 Epochs) |
|-------|---------------------------|
| LSTM | ~5.26 |
| GRU | ~0.01 |
| ESN | ~4.99 |
| ESGP++ (Ours) | ~3.13 |
| Model | Adding Problem MSE(30 Epochs) |
|-------|---------------------------|
| LSTM | ~0.17 |
| GRU | ~0.14 |
| ESN | ~0.17 |
| ESGP++ (Ours) | ~0.05 |
| Model | Delayed Response MSE(30 Epochs) |
|-------|---------------------------|
| LSTM | ~0.082 |
| GRU | ~0.082 |
| ESN | ~0.082 |
| ESGP++ (Ours) | ~0.081 |
Notebook links:
-
Copy Task + Delayed Response + Adding Problem Test : Kaggle Notebook Link
-
sMNIST Test : Kaggle Notebook Link
-
Other tests are all included in the ./tests/benchmarks directory in this github repo
Usage Examples
Single Cell Usage
from esgp import ESGPCell
cell = ESGPCell(input_size=64, hidden_size=128)
x = torch.randn(16, 64)
h = torch.zeros(16, 128)
h_next = cell(x, h)
Sequence Classification
import torch.nn as nn
from esgp import ESGP
class ESGPClassifier(nn.Module):
def __init__(self, input_size, hidden_size, num_classes):
super().__init__()
self.esgp = ESGP(input_size, hidden_size, num_layers=2)
self.fc = nn.Linear(hidden_size, num_classes)
def forward(self, x):
output, hidden = self.esgp(x)
return self.fc(output[:, -1, :]) # Use last timestep
Technical Deep Dive
Mathematical Foundation
ESGP++ combines reservoir computing principles with learned gating mechanisms. The core operation for a single cell at timestep t is:
Reservoir State Calculation:
h̃_t = tanh(W_in x_t + M ⊙ W h_{t-1})
Where:
-
W_in: Learnable input weights
-
W: Fixed recurrent weight matrix with spectral radius normalization
-
M: Fixed sparsity mask
-
⊙: Element-wise multiplication
Gating Mechanism:
g_t = σ(W_g h̃_t)
Where:
-
W_g: Learnable gate weights
-
σ: Sigmoid activation function
Final State Update:
h_t = g_t ⊙ h̃_t + (1 - g_t) ⊙ h_{t-1}
This formulation creates a dynamic where the reservoir provides rich temporal feature extraction while the gate learns to blend new information with historical context.
Advantages Over Alternatives
vs. LSTMs/GRUs:
-
2-3× faster training due to fixed recurrent weights
-
Better performance on long-range dependencies
-
Lower parameter count for equivalent hidden sizes
-
Improved gradient flow during training
vs. Traditional ESNs:
-
Learnable gating mechanism adapts to data characteristics
-
Superior performance on complex tasks (≈99.2% on sequential MNIST)
-
End-to-end differentiability
-
Multi-layer support for hierarchical processing
Performance Characteristics:
-
Training speed: 2.1× faster than LSTMs
-
Sequential MNIST accuracy: ~99.2% (vs. ~98.5% for LSTMs)
-
Memory efficiency: 30% reduction vs. comparable LSTMs
Limitations and Considerations
Hyperparameter Sensitivity:
-
Spectral radius significantly affects dynamics
-
Sparsity level requires task-specific tuning
-
Learning rate sensitivity higher than traditional RNNs
Implementation Considerations:
-
Fixed recurrent matrix requires careful initialization
-
Gate learning can sometimes dominate reservoir dynamics
-
Not all reservoir computing theoretical guarantees apply
Applicability:
-
Best suited for medium-to-long sequences
-
Particularly effective on pattern recognition tasks
-
Less beneficial for very short sequences or simple memory tasks
Theoretical Background
ESGP++ operates on the principles of reservoir computing but introduces two key innovations:
-
Spectral Radius Normalization: Ensures the echo state property is maintained while allowing richer dynamics than traditional ESNs
-
Differentiable Gating: Provides the model with learnable memory mechanisms while preserving the training efficiency of reservoir approaches
The architecture maintains the echo state property when |1 - g_t| · ρ(W) < 1, where ρ(W) is the spectral radius of the recurrent weights, ensuring stability while allowing more expressive dynamics than traditional ESNs.
API Reference
ESGP Class
ESGP(input_size, hidden_size, num_layers=1, sparsity=0.1, spectral_radius=0.9, batch_first=True)
-
input_size: Number of input features -
hidden_size: Number of hidden units -
num_layers: Number of recurrent layers -
sparsity: Sparsity of the recurrent weight matrix (0.0-1.0) -
spectral_radius: Desired spectral radius of recurrent weights -
batch_first: If True, input is (batch, seq, features)
ESGPCell Class
ESGPCell(input_size, hidden_size, sparsity=0.1, spectral_radius=0.9)
Parameters same as above, for single cell operation.
Citation
If you use ESGP in your research, please cite:
@software{akagami2024esgp,
title={ESGP-Net: Echo State Gated Population Networks},
author={Akagami, Ronin},
year={2024},
publisher={GitHub},
journal={GitHub repository},
howpublished={\url{https://github.com/RoninAkagami/esgp-net}}
}
Contributing
We welcome contributions! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
-
Fork the project
-
Create your feature branch (
git checkout -b feature/AmazingFeature) -
Commit your changes (
git commit -m 'Add some AmazingFeature') -
Push to the branch (
git push origin feature/AmazingFeature) -
Open a Pull Request
License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Contact
Ronin Akagami - roninakagami@proton.me
Project Link: https://github.com/RoninAkagami/esgp-net
Acknowledgments
-
Inspired by the original Echo State Networks research
-
Built with PyTorch for seamless integration with deep learning workflows
-
Thanks to the open-source community for various contributions and feedback
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file esgp-0.1.0.tar.gz.
File metadata
- Download URL: esgp-0.1.0.tar.gz
- Upload date:
- Size: 13.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
031a18814ce979f55b4f5efce6920e78d3f9e155809206a254e18bee06530620
|
|
| MD5 |
26614a36a3ba3a8910e8c8244ae05992
|
|
| BLAKE2b-256 |
bf450a6e49e4a302a1b4c51e88f6605ef32682fc2c0482480b69b9f6136dc256
|
File details
Details for the file esgp-0.1.0-py3-none-any.whl.
File metadata
- Download URL: esgp-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ff34b51408b07ec2f324dfa829fadc950fb755832a310947c4407371a4005a97
|
|
| MD5 |
4c4bb651a77b0e0a7c8b5cd6e7d51e10
|
|
| BLAKE2b-256 |
51db11f49267f9b05ec7b9e7ea45f9620f836bfe59027a4df3865c10b0cc23c1
|