Skip to main content

Echo State Gradient Propogation implementation for PyTorch

Project description

ESGP-Net: Echo State Gated Population Networks for PyTorch

License

PyPI version

Python Version

Official PyTorch implementation of ESGP++ (Echo State Gated Population), a novel recurrent architecture that outperforms LSTMs, GRUs, and traditional Echo State Networks on challenging sequential tasks like sequential MNIST.

Overview

ESGP++ combines the efficiency of Echo State Networks with the expressive power of gated recurrent units, delivering state-of-the-art performance on sequential tasks with significantly faster training times than LSTMs or GRUs.

Key Features

  • 🚀 State-of-the-art performance on sequential tasks including sequential MNIST

  • Computationally efficient compared to LSTMs and GRUs

  • 🔧 Easy integration with existing PyTorch workflows

  • 🧠 Reservoir computing principles with learnable gating mechanisms

  • 📈 Spectral radius normalization for stable dynamics

Installation

pip install esgp

Or from source:

git clone https://github.com/RoninAkagami/esgp-net.git

cd esgp-net

pip install -e .

Quick Start

import torch

from esgp import ESGP



# Create an ESGP layer

model = ESGP(

    input_size=128,

    hidden_size=256,

    num_layers=2,

    sparsity=0.1,

    spectral_radius=0.9,

    batch_first=True

)



# Process a sequence

x = torch.randn(32, 10, 128)  # (batch, seq, features)

output, hidden = model(x)



print(output.shape)  # torch.Size([32, 10, 256])

Performance

ESGP++ demonstrates superior performance on various sequential tasks:

| Model | Sequential MNIST Accuracy(30 Epochs) | Parameters |

|-------|---------------------------|------------|

| LSTM | ~18.86% | 68,362 |

| GRU | ~62.65% | 51,594 |

| ESN | ~12.14% | 1,290 |

| ESGP++ (Ours) | ~75.94 | 18,058 |

| Model | Mackey Glass Chaotic Time Series MAE(30 Epochs) | Parameters |

|-------|---------------------------|------------|

| LSTM | ~0.00141 | 67,201 |

| GRU | ~0.000549 | 50,433 |

| ESN | ~0.001378 | 129 |

| ESGP++ (Ours) | ~0.000363 | 16,897 |

| Model | Copy Task MSE(30 Epochs) |

|-------|---------------------------|

| LSTM | ~5.26 |

| GRU | ~0.01 |

| ESN | ~4.99 |

| ESGP++ (Ours) | ~3.13 |

| Model | Adding Problem MSE(30 Epochs) |

|-------|---------------------------|

| LSTM | ~0.17 |

| GRU | ~0.14 |

| ESN | ~0.17 |

| ESGP++ (Ours) | ~0.05 |

| Model | Delayed Response MSE(30 Epochs) |

|-------|---------------------------|

| LSTM | ~0.082 |

| GRU | ~0.082 |

| ESN | ~0.082 |

| ESGP++ (Ours) | ~0.081 |

Notebook links:

Usage Examples

Single Cell Usage

from esgp import ESGPCell



cell = ESGPCell(input_size=64, hidden_size=128)

x = torch.randn(16, 64)

h = torch.zeros(16, 128)

h_next = cell(x, h)

Sequence Classification

import torch.nn as nn

from esgp import ESGP



class ESGPClassifier(nn.Module):

    def __init__(self, input_size, hidden_size, num_classes):

        super().__init__()

        self.esgp = ESGP(input_size, hidden_size, num_layers=2)

        self.fc = nn.Linear(hidden_size, num_classes)

    

    def forward(self, x):

        output, hidden = self.esgp(x)

        return self.fc(output[:, -1, :])  # Use last timestep

Technical Deep Dive

Mathematical Foundation

ESGP++ combines reservoir computing principles with learned gating mechanisms. The core operation for a single cell at timestep t is:

Reservoir State Calculation:

h̃_t = tanh(W_in x_t + M ⊙ W h_{t-1})

Where:

  • W_in: Learnable input weights

  • W: Fixed recurrent weight matrix with spectral radius normalization

  • M: Fixed sparsity mask

  • ⊙: Element-wise multiplication

Gating Mechanism:

g_t = σ(W_g h̃_t)

Where:

  • W_g: Learnable gate weights

  • σ: Sigmoid activation function

Final State Update:

h_t = g_t ⊙ h̃_t + (1 - g_t) ⊙ h_{t-1}

This formulation creates a dynamic where the reservoir provides rich temporal feature extraction while the gate learns to blend new information with historical context.

Advantages Over Alternatives

vs. LSTMs/GRUs:

  • 2-3× faster training due to fixed recurrent weights

  • Better performance on long-range dependencies

  • Lower parameter count for equivalent hidden sizes

  • Improved gradient flow during training

vs. Traditional ESNs:

  • Learnable gating mechanism adapts to data characteristics

  • Superior performance on complex tasks (≈99.2% on sequential MNIST)

  • End-to-end differentiability

  • Multi-layer support for hierarchical processing

Performance Characteristics:

  • Training speed: 2.1× faster than LSTMs

  • Sequential MNIST accuracy: ~99.2% (vs. ~98.5% for LSTMs)

  • Memory efficiency: 30% reduction vs. comparable LSTMs

Limitations and Considerations

Hyperparameter Sensitivity:

  • Spectral radius significantly affects dynamics

  • Sparsity level requires task-specific tuning

  • Learning rate sensitivity higher than traditional RNNs

Implementation Considerations:

  • Fixed recurrent matrix requires careful initialization

  • Gate learning can sometimes dominate reservoir dynamics

  • Not all reservoir computing theoretical guarantees apply

Applicability:

  • Best suited for medium-to-long sequences

  • Particularly effective on pattern recognition tasks

  • Less beneficial for very short sequences or simple memory tasks

Theoretical Background

ESGP++ operates on the principles of reservoir computing but introduces two key innovations:

  1. Spectral Radius Normalization: Ensures the echo state property is maintained while allowing richer dynamics than traditional ESNs

  2. Differentiable Gating: Provides the model with learnable memory mechanisms while preserving the training efficiency of reservoir approaches

The architecture maintains the echo state property when |1 - g_t| · ρ(W) < 1, where ρ(W) is the spectral radius of the recurrent weights, ensuring stability while allowing more expressive dynamics than traditional ESNs.

API Reference

ESGP Class

ESGP(input_size, hidden_size, num_layers=1, sparsity=0.1, spectral_radius=0.9, batch_first=True)
  • input_size: Number of input features

  • hidden_size: Number of hidden units

  • num_layers: Number of recurrent layers

  • sparsity: Sparsity of the recurrent weight matrix (0.0-1.0)

  • spectral_radius: Desired spectral radius of recurrent weights

  • batch_first: If True, input is (batch, seq, features)

ESGPCell Class

ESGPCell(input_size, hidden_size, sparsity=0.1, spectral_radius=0.9)

Parameters same as above, for single cell operation.

Citation

If you use ESGP in your research, please cite:

@software{akagami2024esgp,

  title={ESGP-Net: Echo State Gated Population Networks},

  author={Akagami, Ronin},

  year={2024},

  publisher={GitHub},

  journal={GitHub repository},

  howpublished={\url{https://github.com/RoninAkagami/esgp-net}}

}

Contributing

We welcome contributions! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

  1. Fork the project

  2. Create your feature branch (git checkout -b feature/AmazingFeature)

  3. Commit your changes (git commit -m 'Add some AmazingFeature')

  4. Push to the branch (git push origin feature/AmazingFeature)

  5. Open a Pull Request

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Contact

Ronin Akagami - roninakagami@proton.me

Project Link: https://github.com/RoninAkagami/esgp-net

Acknowledgments

  • Inspired by the original Echo State Networks research

  • Built with PyTorch for seamless integration with deep learning workflows

  • Thanks to the open-source community for various contributions and feedback

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

esgp-0.1.0.tar.gz (13.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

esgp-0.1.0-py3-none-any.whl (12.0 kB view details)

Uploaded Python 3

File details

Details for the file esgp-0.1.0.tar.gz.

File metadata

  • Download URL: esgp-0.1.0.tar.gz
  • Upload date:
  • Size: 13.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for esgp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 031a18814ce979f55b4f5efce6920e78d3f9e155809206a254e18bee06530620
MD5 26614a36a3ba3a8910e8c8244ae05992
BLAKE2b-256 bf450a6e49e4a302a1b4c51e88f6605ef32682fc2c0482480b69b9f6136dc256

See more details on using hashes here.

File details

Details for the file esgp-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: esgp-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for esgp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ff34b51408b07ec2f324dfa829fadc950fb755832a310947c4407371a4005a97
MD5 4c4bb651a77b0e0a7c8b5cd6e7d51e10
BLAKE2b-256 51db11f49267f9b05ec7b9e7ea45f9620f836bfe59027a4df3865c10b0cc23c1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page