No project description provided
Project description
Costra
This is a tool for automatic evaluation of Czech sentence embeddings using Costra 1.1 dataset.
More information can be found in the following paper:
- Petra Barančíková and Ondřej Bojar: Costra 1.1: An Inquiry into Geometric Properties of Sentence Spaces. In: TSD 2020. Lecture Notes in Computer Science, vol 12284. Springer, Cham.
The presentation of the paper with the accompanying video can be found here.
Installation
$ pip install costra
Usage
- Get sentences from Costra:
from costra import costra
CostraEvaluator = costra.CostraEvaluator()
sentences = CostraEvaluator.get_sentences()
- Generate embeddings (example with SentenceTransformers):
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("Seznam/retromae-small-cs")
embeddings = model.encode(sentences)
- Evaluating the embeddings:
results = CostraEvaluator.evaluate(embeddings)
- Results have the following format, with
costrabeing the overall score.
{
'basic': 0.063,
'modality': 0.079,
'time': 0.692,
'style': 0.634,
'generalization': 0.695,
'opposite_meaning': 0.751,
'costra': 0.486
}
For more detail about Costra categories, refer to the original paper.
Citation
If you use the tool, please consider citing the following paper:
@inproceedings{Costra,
author = {Petra Baran{\v{\c}}{\'{\i}}kov{\'{a}} and Ond{\v{\r}}ej Bojar},
editor = {Petr Sojka and Ivan Kope{\v{\c}}ek and Karel Pala and Ales Hor{\'{a}}k},
title = {Costra 1.1: An Inquiry into Geometric Properties of Sentence Spaces},
booktitle = {Text, Speech, and Dialogue - 23rd International Conference, {TSD}
2020, Brno, Czech Republic, September 8-11, 2020, Proceedings},
series = {Lecture Notes in Computer Science},
volume = {12284},
pages = {135--143},
publisher = {Springer},
year = {2020},
url = {https://doi.org/10.1007/978-3-030-58323-1\_14},
doi = {10.1007/978-3-030-58323-1\_14},
}
License
The data is distributed under the Creative Commons 4.0 BY.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file costra-1.1.tar.gz.
File metadata
- Download URL: costra-1.1.tar.gz
- Upload date:
- Size: 230.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
29ed40e2b47654e4f5545b60ca8aa917e410b6904076626e4fa2641dcf486d94
|
|
| MD5 |
1cef502d6443e16db768d44feea79835
|
|
| BLAKE2b-256 |
a20e649809be4f4cea3646a1a7a97e3946f86cc576af7c718963f05272f86c3f
|
File details
Details for the file costra-1.1-py3-none-any.whl.
File metadata
- Download URL: costra-1.1-py3-none-any.whl
- Upload date:
- Size: 230.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
db5508904c66e8fe9e741ee6189514e77d4934ed6b63e913b239271f9f46730c
|
|
| MD5 |
49a4f656b2ccedae46edb465c24a6c69
|
|
| BLAKE2b-256 |
184a24572aa7dbc4bb06435adec21440bec93f4e922f1893fffba579a53c2d87
|