Skip to main content

Fast Weight Attention

Project description

Fast Weight Attention

An attention based fast weight episodic memory, in the same vein as the memory MLP from TTT / Titans and fast weight PKM from Sakana AI

Install

$ pip install fast-weight-attention

Usage

import torch
from fast_weight_attention import FastWeightAttention

mem = FastWeightAttention(512, causal = True)

tokens = torch.randn(1, 64, 512)

past_mem = None

retrieved, next_mem = mem(tokens, past_mem = past_mem, return_next_memories = True)
retrieved, next_mem = mem(tokens, past_mem = next_mem, return_next_memories = True)
retrieved, next_mem = mem(tokens, past_mem = next_mem, return_next_memories = True)

assert retrieved.shape == tokens.shape

With chunked processing (automatically segments the sequence and carries memory across chunks):

import torch
from fast_weight_attention import ChunkedFastWeightAttention

mem = ChunkedFastWeightAttention(
    512,
    causal = True,
    chunk_size = 64   # process 64 tokens at a time, carrying fast weight memories across chunks
)

tokens = torch.randn(1, 512, 512)

retrieved, next_mem = mem(tokens, return_next_memories = True)

assert retrieved.shape == tokens.shape

Citations

@article{zhang2026loger,
    title   = {LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory},
    author  = {Zhang, Junyi and Herrmann, Charles and Hur, Junhwa and Sun, Chen and Yang, Ming-Hsuan and Cole, Forrester and Darrell, Trevor and Sun, Deqing},
    journal = {arXiv preprint arXiv:2603.03269},
    year    = {2026}
}
@misc{zhao2026fastweightproductkeymemory,
    title   = {Fast-weight Product Key Memory},
    author  = {Tianyu Zhao and Llion Jones},
    year    = {2026},
    eprint  = {2601.00671},
    archivePrefix = {arXiv},
    primaryClass = {cs.CL},
    url     = {https://arxiv.org/abs/2601.00671},
}
@misc{jordan2024muon,
    author  = {Keller Jordan and Yuchen Jin and Vlado Boza and Jiacheng You and Franz Cesista and Laker Newhouse and Jeremy Bernstein},
    title   = {Muon: An optimizer for hidden layers in neural networks},
    year    = {2024},
    url     = {https://kellerjordan.github.io/posts/muon/}
}
@article{Yaghoubietal2026,
    author  = {Yaghoubi, Mohammad and Nieto-Posadas, Andres and Mosser, Coralie-Anne and Gisiger, Thomas and Wilson, Émmanuel and Williams, Sylvain and Brandon, Mark P.},
    title   = {Predictive coding of reward in the hippocampus},
    journal = {Nature},
    year    = {2026},
    doi     = {10.1038/s41586-025-09958-0}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fast_weight_attention-0.1.4.tar.gz (8.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fast_weight_attention-0.1.4-py3-none-any.whl (8.5 kB view details)

Uploaded Python 3

File details

Details for the file fast_weight_attention-0.1.4.tar.gz.

File metadata

File hashes

Hashes for fast_weight_attention-0.1.4.tar.gz
Algorithm Hash digest
SHA256 236f8a7eb749afe6350980679e5a295c561d2810fa1e61759e0a33b55ad0cad0
MD5 ebc7afa3dc644e96f7ba099a4bddb1dd
BLAKE2b-256 cfb0594bd95902cd7f23e9879c798b2b9a978f3c5a2cfb2221e2cac36404aaf9

See more details on using hashes here.

File details

Details for the file fast_weight_attention-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for fast_weight_attention-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 7ed2bb09d4d3489730d47643977b8dfb4449227a117699564b0076a0ff73a4c7
MD5 da69171edf5640877602124bb17ab9ee
BLAKE2b-256 c9a5af257bd7d9eb047370d92d21888923e00a1da0612a763816b38ec36bce58

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page