Skip to main content

Fast Weight Attention

Project description

Fast Weight Attention

An attention based fast weight episodic memory, in the same vein as the memory MLP from TTT / Titans and fast weight PKM from Sakana AI

Install

$ pip install fast-weight-attention

Usage

import torch
from fast_weight_attention import FastWeightAttention

mem = FastWeightAttention(512, causal = True)

tokens = torch.randn(1, 64, 512)

past_mem = None

retrieved, next_mem = mem(tokens, past_mem = past_mem, return_next_memories = True)
retrieved, next_mem = mem(tokens, past_mem = next_mem, return_next_memories = True)
retrieved, next_mem = mem(tokens, past_mem = next_mem, return_next_memories = True)

assert retrieved.shape == tokens.shape

# you can then retrieve without fast weight updating

retrieved = mem(tokens, return_next_memories = False)

With chunked processing (automatically segments the sequence and carries memory across chunks):

import torch
from fast_weight_attention import ChunkedFastWeightAttention

mem = ChunkedFastWeightAttention(
    512,
    causal = True,
    chunk_size = 64   # process 64 tokens at a time, carrying fast weight memories across chunks
)

tokens = torch.randn(1, 512, 512)

retrieved, next_mem = mem(tokens, return_next_memories = True)

assert retrieved.shape == tokens.shape

Citations

@article{zhang2026loger,
    title   = {LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory},
    author  = {Zhang, Junyi and Herrmann, Charles and Hur, Junhwa and Sun, Chen and Yang, Ming-Hsuan and Cole, Forrester and Darrell, Trevor and Sun, Deqing},
    journal = {arXiv preprint arXiv:2603.03269},
    year    = {2026}
}
@misc{zhao2026fastweightproductkeymemory,
    title   = {Fast-weight Product Key Memory},
    author  = {Tianyu Zhao and Llion Jones},
    year    = {2026},
    eprint  = {2601.00671},
    archivePrefix = {arXiv},
    primaryClass = {cs.CL},
    url     = {https://arxiv.org/abs/2601.00671},
}
@misc{jordan2024muon,
    author  = {Keller Jordan and Yuchen Jin and Vlado Boza and Jiacheng You and Franz Cesista and Laker Newhouse and Jeremy Bernstein},
    title   = {Muon: An optimizer for hidden layers in neural networks},
    year    = {2024},
    url     = {https://kellerjordan.github.io/posts/muon/}
}
@article{Yaghoubietal2026,
    author  = {Yaghoubi, Mohammad and Nieto-Posadas, Andres and Mosser, Coralie-Anne and Gisiger, Thomas and Wilson, Émmanuel and Williams, Sylvain and Brandon, Mark P.},
    title   = {Predictive coding of reward in the hippocampus},
    journal = {Nature},
    year    = {2026},
    doi     = {10.1038/s41586-025-09958-0}
}
@article{volchkov2026cliptogrok,
    title   = {Clip to Grok: Weight Norm Clipping for Accelerated Generalization},
    author  = {Volchkov, Vladimir and Rivlin, Aviad},
    year    = {2026},
    journal = {arXiv preprint},
    note    = {Implementation available at \url{https://github.com/NiftyliuS/cliptogrok}}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fast_weight_attention-0.1.8.tar.gz (9.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fast_weight_attention-0.1.8-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file fast_weight_attention-0.1.8.tar.gz.

File metadata

File hashes

Hashes for fast_weight_attention-0.1.8.tar.gz
Algorithm Hash digest
SHA256 fa77d7c37649e6465f8d3eb5f4ae455c70b5516af4e4f79f36f1db0696f53d58
MD5 f74fa2e17e4e1bd8a6490b7595a1dc6c
BLAKE2b-256 c18ed70f0241a4dba125ac16899948a4782743d69be54d9376b8c544470aaf4d

See more details on using hashes here.

File details

Details for the file fast_weight_attention-0.1.8-py3-none-any.whl.

File metadata

File hashes

Hashes for fast_weight_attention-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 fa542aa6e3b1198241b7effcb159c8a01bafd96d37370e35afaf49f73ed74a68
MD5 309d9016a4d70e7ea9070d4481b8ebc8
BLAKE2b-256 d72812cbf5bb10f336ddcc6af1bd7f78d71912c7e428a7cde8eb90cac5a55c58

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page