Skip to main content

Measure layer-wise token embedding cosine similarity (embedding condensation diagnostic).

Project description

embedding-condensation

Minimal library for the embedding condensation diagnostic from LM-Dispersion: layer-wise token cosine-similarity matrices and optional heatmaps.

Install

cd pypi
pip install -e ".[test]"

Usage

from transformers import AutoModel, AutoTokenizer
from embedding_condensation import measure_embedding_condensation

model = AutoModel.from_pretrained("gpt2").eval()
tokenizer = AutoTokenizer.from_pretrained("gpt2")

result = measure_embedding_condensation(
    model,
    tokenizer,
    texts=["Your long input text here. " * 200],
    repetitions=1,
    plot=False,
)
print(result.mean_cossim_by_layer)

PyPI upload

cd pypi
pip install build twine
python -m build
twine upload dist/*

Test

cd pypi
pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embedding_condensation-0.1.0.tar.gz (4.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

embedding_condensation-0.1.0-py3-none-any.whl (4.8 kB view details)

Uploaded Python 3

File details

Details for the file embedding_condensation-0.1.0.tar.gz.

File metadata

  • Download URL: embedding_condensation-0.1.0.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for embedding_condensation-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d94db6d79eb1de8eb29179f897bbb458d2c3accb7794ae4aa5b410be4e265624
MD5 415f779612b638ba862f26fa618bca53
BLAKE2b-256 2d986cb21f1fb414623b0ccbe2449d38b93c7e325c0bd4d33045974e83600f76

See more details on using hashes here.

File details

Details for the file embedding_condensation-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for embedding_condensation-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b2f024382c3ee83d8cb9d522f9c8742eb57ca848bd48430d76e88d98973ba016
MD5 e67ea0d1e345c150b7d0ae916cb20dec
BLAKE2b-256 f5d6c8bfaf68e195ca62677c846ec782853cf7847046260b9caa28417eb90d7b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page