Skip to main content

This is a python tool to evaluate alignment and uniformity of sentence embedding

Project description

AlignUnformEval

This is a python tool to evaluate alignment and uniformity of sentence embedding like SimCSE paper.

SimCSE paper explains alignment and uniformity as below:

Given a distribution of positive pairs p_pos, alignment calculates expected distance between embeddings of the paired instances (assuming representations are already normalized):

On the other hand, uniformity measures how well the embeddings are uniformly distributed:

> where p_data denotes the data distribution.

Install

by pip

pip install alignuniformeval

by source

pip install https://github.com/akiFQC/AlignUnformEval

Usage

You can easily evaluate alignment and uniformity with this library.
This is a minimal example that evaluate alignment and uniformity of STS Benchmark.

from alignunformeval import STSBEval

evaluator = STSBEval(sentence_encoder)
# sentence_encoder is a callable from List[str] to numpy.array. The output numpy.array must be [dimention_of_sentence_vector].
result = evaluator.eval_summary()
# result =  {"alignment": value_of_aligenment, "uniformity": value_of_uniformity}

STSBEval get callable whose input is list of str and output is n dimentional numpy.array.

Dataset

STS Benchmark

This dataset (especially, sts-dev.csv) was used in SimCSE paper. In the paper, the threshold of similarity score was st at 4.0; pairs of sentences whose similarity score is higher than 4.0 are used for evaluation of alignment. You can set other threshold as the following example.

from alignunformeval import STSBEval
# sentence_encoder : some function List[str] to np.array[dimention_of_sentence_vector]
evaluator = STSBEval(sentence_encoder, threshold=3.0) # set threshold at 3.0
result = evaluator.eval_summary()

Please see test/test_stsb.py if you want more details.

Tokyo Metropolitan University Paraphrase Corpus (TMUP)

Tokyo Metropolitan University Paraphrase Corpus (TMUP) is a Japanese paraphrase dataset.

from alignunformeval import TMUPEval
# sentence_encoder : some function List[str] to np.array[dimention_of_sentence_vector]
evaluator = TMUPEval(sentence_encoder)
result = evaluator.eval_summary()

License

The license of this tool follows each dataset. Please read the documents of datasets you use.

Reference

  1. https://arxiv.org/pdf/2104.08821.pdf
  2. https://ixa2.si.ehu.eus/stswiki/index.php/STSbenchmark
  3. https://github.com/tmu-nlp/paraphrase-corpus

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alignunformeval-0.0.1.1.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

alignunformeval-0.0.1.1-py3-none-any.whl (5.0 kB view details)

Uploaded Python 3

File details

Details for the file alignunformeval-0.0.1.1.tar.gz.

File metadata

  • Download URL: alignunformeval-0.0.1.1.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.11 CPython/3.9.7 Linux/5.11.0-43-generic

File hashes

Hashes for alignunformeval-0.0.1.1.tar.gz
Algorithm Hash digest
SHA256 2b3cbd4f022a0902054fc607f04a330c871f629b381d97708bca0683665299d3
MD5 574fa5031585f5a6ed8a80bc60c9c33f
BLAKE2b-256 c900c5cf9499f9405f11cebe0c08a8a733c9a6455c5638778dfe08afbec8545e

See more details on using hashes here.

File details

Details for the file alignunformeval-0.0.1.1-py3-none-any.whl.

File metadata

  • Download URL: alignunformeval-0.0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 5.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.11 CPython/3.9.7 Linux/5.11.0-43-generic

File hashes

Hashes for alignunformeval-0.0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 28cd162622fd6c24fb0dbfb5bef6d0c1eb87745fcd51a04332501a5e2c1e4e09
MD5 2079c8d3ef2d5a9c8ad1491517c979dc
BLAKE2b-256 2f92f94a4101bf9c1a66b3a2d0173190ac00228ef99d895827189d0e654fc347

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page