Skip to main content

No project description provided

Project description

Costra

This is a tool for automatic evaluation of Czech sentence embeddings using Costra 1.1 dataset.

More information can be found in the following paper:

The presentation of the paper with the accompanying video can be found here.

Installation

$ pip install costra

Usage

  1. Get sentences from Costra:
from costra import costra
CostraEvaluator = costra.CostraEvaluator()
sentences = CostraEvaluator.get_sentences()
  1. Generate embeddings (example with SentenceTransformers):
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("Seznam/retromae-small-cs")
embeddings = model.encode(sentences)
  1. Evaluating the embeddings:
results = CostraEvaluator.evaluate(embeddings)
  1. Results have the following format, with costra being the overall score.
{
    'basic': 0.063,
    'modality': 0.079,
    'time': 0.692,
    'style': 0.634,
    'generalization': 0.695,
    'opposite_meaning': 0.751,
    'costra': 0.486
}

For more detail about Costra categories, refer to the original paper.

Citation

If you use the tool, please consider citing the following paper:

@inproceedings{Costra,
  author    = {Petra Baran{\v{\c}}{\'{\i}}kov{\'{a}} and Ond{\v{\r}}ej Bojar},
  editor    = {Petr Sojka and Ivan Kope{\v{\c}}ek and Karel Pala and Ales Hor{\'{a}}k},
  title     = {Costra 1.1: An Inquiry into Geometric Properties of Sentence Spaces},
  booktitle = {Text, Speech, and Dialogue - 23rd International Conference, {TSD}
               2020, Brno, Czech Republic, September 8-11, 2020, Proceedings},
  series    = {Lecture Notes in Computer Science},
  volume    = {12284},
  pages     = {135--143},
  publisher = {Springer},
  year      = {2020},
  url       = {https://doi.org/10.1007/978-3-030-58323-1\_14},
  doi       = {10.1007/978-3-030-58323-1\_14},
}

License

The data is distributed under the Creative Commons 4.0 BY.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

costra-1.1.tar.gz (230.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

costra-1.1-py3-none-any.whl (230.0 kB view details)

Uploaded Python 3

File details

Details for the file costra-1.1.tar.gz.

File metadata

  • Download URL: costra-1.1.tar.gz
  • Upload date:
  • Size: 230.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for costra-1.1.tar.gz
Algorithm Hash digest
SHA256 29ed40e2b47654e4f5545b60ca8aa917e410b6904076626e4fa2641dcf486d94
MD5 1cef502d6443e16db768d44feea79835
BLAKE2b-256 a20e649809be4f4cea3646a1a7a97e3946f86cc576af7c718963f05272f86c3f

See more details on using hashes here.

File details

Details for the file costra-1.1-py3-none-any.whl.

File metadata

  • Download URL: costra-1.1-py3-none-any.whl
  • Upload date:
  • Size: 230.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for costra-1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 db5508904c66e8fe9e741ee6189514e77d4934ed6b63e913b239271f9f46730c
MD5 49a4f656b2ccedae46edb465c24a6c69
BLAKE2b-256 184a24572aa7dbc4bb06435adec21440bec93f4e922f1893fffba579a53c2d87

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page