Skip to main content

SelfExtendAttn - Pytorch

Project description

Multi-Modality

SelfExtendAttn

Implementation of SelfExtendAttn from the paper "LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning" from Pytorch and Zeta. This implementation is based mostly on the pseudocode listed in Algorithm 1 in page 4

Install

pip install selfextend

Usage

import torch
from se_attn import SelfExtendAttn

# Example usage
dim = 512  # Dimension of model
g_size = 2  # Group size
w_size = 4  # Window size for neighbor tokens
self_extend = SelfExtendAttn(dim, g_size, w_size, qk_norm=True)

# Example tensors for q, k, v, and pos
q = torch.randn(1, 10, dim)
k = torch.randn(1, 10, dim)
v = torch.randn(1, 10, dim)
pos = torch.arange(0, 10).unsqueeze(0)  # Example positional indices

output = self_extend(q, k, v, pos)
print(output)

Technical Architecture

Key Concepts

  • Grouped Attention: This mechanism divides the input sequence into groups and applies the attention operation within each group. It uses a floor operation to adjust the positions within the groups, enabling efficient handling of longer sequences.

  • Normal Attention: Standard self-attention used in transformers, focusing on nearby tokens within a specified window.

Attention Mechanism

The SelfExtendAttn module integrates these two attention strategies:

  1. Normal Attention is applied to tokens within a neighborhood window, maintaining precise positional information for closely related tokens.

  2. Grouped Attention is used for tokens outside this neighborhood window. It reduces the granularity of positional information for distant tokens, which is less critical but still contributes to the overall context understanding.

Merge Strategy

The attention values outside the neighborhood window are replaced by those obtained from the grouped attention. This merging strategy ensures a smooth transition and efficient processing of longer sequences while preserving the essential context captured by the normal attention within the neighborhood window.

Positional Encoding

Sine and cosine functions generate positional encodings, ensuring that the model retains an understanding of token order and position.

Implementation Details

  • Module Class: SelfExtendAttn is implemented as a subclass of nn.Module in PyTorch.
  • Configurability: Key parameters such as group size and neighbor window size are configurable.
  • Causal Masking: Ensures that the attention mechanism respects the autoregressive property of language models.

Citation

@misc{jin2024llm,
    title={LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning}, 
    author={Hongye Jin and Xiaotian Han and Jingfeng Yang and Zhimeng Jiang and Zirui Liu and Chia-Yuan Chang and Huiyuan Chen and Xia Hu},
    year={2024},
    eprint={2401.01325},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

selfextend-0.0.1.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

selfextend-0.0.1-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file selfextend-0.0.1.tar.gz.

File metadata

  • Download URL: selfextend-0.0.1.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.11.0 Darwin/22.4.0

File hashes

Hashes for selfextend-0.0.1.tar.gz
Algorithm Hash digest
SHA256 14f1598ca15294456935dba5de810315357a9930d94d5b4e185a1e38bbd5f2a0
MD5 30a291653c2d977289898e8f0af6c02e
BLAKE2b-256 a130d463aa2670d9703f1fdbdd9ea4baca2f1aa2134894d57b23a56cae27ba07

See more details on using hashes here.

File details

Details for the file selfextend-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: selfextend-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 5.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.11.0 Darwin/22.4.0

File hashes

Hashes for selfextend-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4703ae6e96c626dd5d3f4e95d3b0de625fc97975e719849ce74f5e4063064aa2
MD5 0c18d8c6c830c3ade1fc65a370876659
BLAKE2b-256 be220c098f4e39b1d7904c9707eb584c7a9cd4cd61362b4fc8f5ace604ec070a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page