Zero-Training Context Extension for Transformer Encoders via Nonlinear Absolute Positional Embeddings Interpolation
Project description
Zero-Training Context Extension for Transformer Encoders via Nonlinear Absolute Positional Embeddings Interpolation
Official implementation of "Zero-Training Context Extension for Transformer Encoders via Nonlinear Absolute Positional Embeddings Interpolation". Paper preprint is coming soon.
This implementation currently supports only models compatible with Sentence Transformers library.
Models
Models are available at HuggingFace:
| Model | Context length | Language |
|---|---|---|
| idanylenko/e5-large-v2-ctx1024 | 1024 | English |
Installation
To install the package, use pip:
pip install context-extension
Usage
After installing the package you may use extend-context scripts for embeddings interpolation. The script modifies the positional embeddings of a model and save the updated model to the specified directory. You can then upload the resulting model to Hugging Face or use it locally for inference.
Recommended option is to set --interpolation_type=cubic as this provides smooth interpolation in contrast to linear interpolation. For models like RoBERTa that use special tokens in the first few positions, remember to set appropriate --offset argument. Too big --max_seq_length argument values may result in performance degradation.
Use extend-context --help to see all available options and parameters.
Spline Interpolation
Use this for smooth, nonlinear interpolation:
extend-context \
--model_name_or_path="intfloat/e5-large-v2" \
--max_seq_length=1024 \
--embeddings_attr_name="embeddings.position_embeddings" \
--offset=0 \
--interpolation_type=cubic \
--output_dir="intfloat/e5-large-v2-ctx1024-spline"
Linear Interpolation
Use this for linear interpolation:
extend-context \
--model_name_or_path="intfloat/e5-large-v2" \
--max_seq_length=1024 \
--embeddings_attr_name="embeddings.position_embeddings" \
--offset=0 \
--interpolation_type=linear \
--output_dir="intfloat/e5-large-v2-ctx1024-linear"
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file context_extension-0.1.3.tar.gz.
File metadata
- Download URL: context_extension-0.1.3.tar.gz
- Upload date:
- Size: 9.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
284d01b0a871f5a5b48f5c6f8bb2e8f5583591db3818d406e5caaae7633769ea
|
|
| MD5 |
f46e1d964e4847836b1de717b5e4f2c9
|
|
| BLAKE2b-256 |
2fa4af7ff6041c86b5744caf65ddb9c1c3130c5f279c5d4fe9afd2cbc318bf32
|
Provenance
The following attestation bundles were made for context_extension-0.1.3.tar.gz:
Publisher:
python-publish.yml on Kowd-PauUh/encoders-context-extension
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
context_extension-0.1.3.tar.gz -
Subject digest:
284d01b0a871f5a5b48f5c6f8bb2e8f5583591db3818d406e5caaae7633769ea - Sigstore transparency entry: 208438922
- Sigstore integration time:
-
Permalink:
Kowd-PauUh/encoders-context-extension@a9cfb76ee4a5ae4be32206e56fa7dfa78d051e45 -
Branch / Tag:
refs/tags/v0.1.post3 - Owner: https://github.com/Kowd-PauUh
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@a9cfb76ee4a5ae4be32206e56fa7dfa78d051e45 -
Trigger Event:
release
-
Statement type:
File details
Details for the file context_extension-0.1.3-py3-none-any.whl.
File metadata
- Download URL: context_extension-0.1.3-py3-none-any.whl
- Upload date:
- Size: 11.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
772370eedd6b6c3da554cb9a42c277b2aca0c72b18aadc6eca92bf4f968aa680
|
|
| MD5 |
f47906c7caa7bb99810aa2468c010a76
|
|
| BLAKE2b-256 |
4b7d6cdfcc6e7defa1d69f39dee74785eea9d5c04ff8c6e0d8e1e0e05c00fceb
|
Provenance
The following attestation bundles were made for context_extension-0.1.3-py3-none-any.whl:
Publisher:
python-publish.yml on Kowd-PauUh/encoders-context-extension
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
context_extension-0.1.3-py3-none-any.whl -
Subject digest:
772370eedd6b6c3da554cb9a42c277b2aca0c72b18aadc6eca92bf4f968aa680 - Sigstore transparency entry: 208438925
- Sigstore integration time:
-
Permalink:
Kowd-PauUh/encoders-context-extension@a9cfb76ee4a5ae4be32206e56fa7dfa78d051e45 -
Branch / Tag:
refs/tags/v0.1.post3 - Owner: https://github.com/Kowd-PauUh
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@a9cfb76ee4a5ae4be32206e56fa7dfa78d051e45 -
Trigger Event:
release
-
Statement type: