Skip to main content

Automatic Evaluation for SignWriting Machine Learning Outputs

Project description

SignWriting Evaluation

The lack of automatic SignWriting evaluation metrics is a major obstacle in the development of SignWriting transcription and translation1 models.

Goals

The primary objective of this repository is to house a suite of automatic evaluation metrics specifically tailored for SignWriting. This includes standard metrics like BLEU2, chrF3, and CLIPScore4, as well as custom-developed metrics unique to our approach. We recognize the distinct challenges in evaluating single signs versus continuous signing, and our methods reflect this differentiation.

To qualitatively demonstrate the efficacy of these evaluation metrics, we implement a nearest-neighbor search for selected signs from the SignBank corpus. The rationale is straightforward: the closer the sign is to its nearest neighbor in the corpus, the more effective the evaluation metric is in capturing the nuances of sign language transcription and translation.

Evaluation Metrics

  • Tokenized BLEU - BLEU score for tokenized SignWriting FSW strings.
  • chrF - chrF score for untokenized SignWriting FSW strings.
  • CLIPScore - CLIPScore between SignWriting images. (Using the original CLIP model)
  • Similarity - symbol distance score for SignWriting FSW strings.
  • Similarity v2 - the improved, name-aware symbol distance (README).

Qualitative Evaluation

Distribution of Scores

Using a sample of the corpus, we compute the any-to-any scores for each metric. Intuitively, we expect a good metric given any two random signs to produce a bad score, since most signs are unrelated. This should be reflected in the distribution of scores, which should be skewed towards lower scores.

Distribution of scores

Nearest Neighbor Search

It is well-known that the SignBank corpus contains many forms of the sign for "hello". We carefully select some of these signs to evaluate our metrics, by looking for their closest matches in the corpus, which contains around 230k single signs.

The problems of each metric are revealed when comparing the top 10 nearest neighbors for each sign. For each sign and metric, either the first match is incorrect, or there is a more correct match further down the list. The table compares the name-aware Similarity v2, the original Similarity, Tokenized BLEU, and chrF; CLIPScore is omitted here, as encoding ~230k SignWriting images per query is prohibitively slow.

reference signreference signreference sign
SymbolsDistancesV2SymbolsDistancesTokenizedBLEUCHRFSymbolsDistancesV2SymbolsDistancesTokenizedBLEUCHRFSymbolsDistancesV2SymbolsDistancesTokenizedBLEUCHRF
1SymbolsDistancesV2 rank 1SymbolsDistances rank 1TokenizedBLEU rank 1CHRF rank 1SymbolsDistancesV2 rank 1SymbolsDistances rank 1TokenizedBLEU rank 1CHRF rank 1SymbolsDistancesV2 rank 1SymbolsDistances rank 1TokenizedBLEU rank 1CHRF rank 1
2SymbolsDistancesV2 rank 2SymbolsDistances rank 2TokenizedBLEU rank 2CHRF rank 2SymbolsDistancesV2 rank 2SymbolsDistances rank 2TokenizedBLEU rank 2CHRF rank 2SymbolsDistancesV2 rank 2SymbolsDistances rank 2TokenizedBLEU rank 2CHRF rank 2
3SymbolsDistancesV2 rank 3SymbolsDistances rank 3TokenizedBLEU rank 3CHRF rank 3SymbolsDistancesV2 rank 3SymbolsDistances rank 3TokenizedBLEU rank 3CHRF rank 3SymbolsDistancesV2 rank 3SymbolsDistances rank 3TokenizedBLEU rank 3CHRF rank 3
4SymbolsDistancesV2 rank 4SymbolsDistances rank 4TokenizedBLEU rank 4CHRF rank 4SymbolsDistancesV2 rank 4SymbolsDistances rank 4TokenizedBLEU rank 4CHRF rank 4SymbolsDistancesV2 rank 4SymbolsDistances rank 4TokenizedBLEU rank 4CHRF rank 4
5SymbolsDistancesV2 rank 5SymbolsDistances rank 5TokenizedBLEU rank 5CHRF rank 5SymbolsDistancesV2 rank 5SymbolsDistances rank 5TokenizedBLEU rank 5CHRF rank 5SymbolsDistancesV2 rank 5SymbolsDistances rank 5TokenizedBLEU rank 5CHRF rank 5
6SymbolsDistancesV2 rank 6SymbolsDistances rank 6TokenizedBLEU rank 6CHRF rank 6SymbolsDistancesV2 rank 6SymbolsDistances rank 6TokenizedBLEU rank 6CHRF rank 6SymbolsDistancesV2 rank 6SymbolsDistances rank 6TokenizedBLEU rank 6CHRF rank 6
7SymbolsDistancesV2 rank 7SymbolsDistances rank 7TokenizedBLEU rank 7CHRF rank 7SymbolsDistancesV2 rank 7SymbolsDistances rank 7TokenizedBLEU rank 7CHRF rank 7SymbolsDistancesV2 rank 7SymbolsDistances rank 7TokenizedBLEU rank 7CHRF rank 7
8SymbolsDistancesV2 rank 8SymbolsDistances rank 8TokenizedBLEU rank 8CHRF rank 8SymbolsDistancesV2 rank 8SymbolsDistances rank 8TokenizedBLEU rank 8CHRF rank 8SymbolsDistancesV2 rank 8SymbolsDistances rank 8TokenizedBLEU rank 8CHRF rank 8
9SymbolsDistancesV2 rank 9SymbolsDistances rank 9TokenizedBLEU rank 9CHRF rank 9SymbolsDistancesV2 rank 9SymbolsDistances rank 9TokenizedBLEU rank 9CHRF rank 9SymbolsDistancesV2 rank 9SymbolsDistances rank 9TokenizedBLEU rank 9CHRF rank 9
10SymbolsDistancesV2 rank 10SymbolsDistances rank 10TokenizedBLEU rank 10CHRF rank 10SymbolsDistancesV2 rank 10SymbolsDistances rank 10TokenizedBLEU rank 10CHRF rank 10SymbolsDistancesV2 rank 10SymbolsDistances rank 10TokenizedBLEU rank 10CHRF rank 10

Cite

If you use our toolkit in your research or projects, please consider citing the work.

@misc{signwriting-evaluation2024,
    title={SignWriting Evaluation: Metrics for Evaluating SignWriting Transcription and Translation Models},
    author={Moryossef, Amit and Zilberman, Rotem and Langer, Ohad},
    howpublished={\url{https://github.com/sign-language-processing/signwriting-evaluation}},
    year={2024}
}

References

  1. Amit Moryossef, Zifan Jiang. 2023. SignBank+: Preparing a Multilingual Sign Language Dataset for Machine Translation Using Large Language Models.

  2. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.

  3. Maja Popović. 2015. chrF: character n-gram F-score for automatic MT evaluation. In Proceedings of the Tenth Workshop on Statistical Machine Translation, pages 392–395, Lisbon, Portugal. Association for Computational Linguistics.

  4. Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. 2021. CLIPScore: A Reference-free Evaluation Metric for Image Captioning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7514–7528, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

signwriting_evaluation-0.1.0.tar.gz (274.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

signwriting_evaluation-0.1.0-cp39-abi3-win_amd64.whl (461.1 kB view details)

Uploaded CPython 3.9+Windows x86-64

signwriting_evaluation-0.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (593.2 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

signwriting_evaluation-0.1.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (582.9 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ ARM64

signwriting_evaluation-0.1.0-cp39-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (1.0 MB view details)

Uploaded CPython 3.9+macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

File details

Details for the file signwriting_evaluation-0.1.0.tar.gz.

File metadata

  • Download URL: signwriting_evaluation-0.1.0.tar.gz
  • Upload date:
  • Size: 274.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for signwriting_evaluation-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4bf7d75c6b1c1320cbf5a7b3f2c2e5a4bb954e723baad31b5b47cf57789e59fc
MD5 b65a9d9c473f4814c2153780087a870b
BLAKE2b-256 d510a396407c0c8a5d60c9e10017b712aa9f713226408b33111f4de65a4bcd5b

See more details on using hashes here.

Provenance

The following attestation bundles were made for signwriting_evaluation-0.1.0.tar.gz:

Publisher: release.yaml on sign-language-processing/signwriting-evaluation

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file signwriting_evaluation-0.1.0-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for signwriting_evaluation-0.1.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 f6a64056c5503e660ce5c0859ee027496bbd2549deca8910484ad17919440570
MD5 d281f17216f790ade25f3e4643c42229
BLAKE2b-256 a705b102023dea80b25649b63f7a3e127cb9becd5ca9bee7a265087f796ed229

See more details on using hashes here.

Provenance

The following attestation bundles were made for signwriting_evaluation-0.1.0-cp39-abi3-win_amd64.whl:

Publisher: release.yaml on sign-language-processing/signwriting-evaluation

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file signwriting_evaluation-0.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for signwriting_evaluation-0.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f3bf1e11e2a6d061431c8ec90acb1ef2f78b517c66f03bc846caf63c29619b93
MD5 2640023892ecc6d7f063773cb210d080
BLAKE2b-256 6cbb41c061bf97880507aa9666adba253dc2fc4393d6e38a91b62388fe4f5898

See more details on using hashes here.

Provenance

The following attestation bundles were made for signwriting_evaluation-0.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yaml on sign-language-processing/signwriting-evaluation

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file signwriting_evaluation-0.1.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for signwriting_evaluation-0.1.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 dd068fe5b10a14aec2d456f18b88f2eff7be6d4036964a71ded3fce6d3236ce1
MD5 e78bfa197468130a61466eaecbb6bba8
BLAKE2b-256 cb7aa7a8f3ee758ba889ff443e3507c5213e3fba512fabe7c5ac294dc61c3258

See more details on using hashes here.

Provenance

The following attestation bundles were made for signwriting_evaluation-0.1.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yaml on sign-language-processing/signwriting-evaluation

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file signwriting_evaluation-0.1.0-cp39-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for signwriting_evaluation-0.1.0-cp39-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 78e83eb0737bca4d3ad32e710250bbeea208f88755803b2e5cc3fd7058535d81
MD5 609c7e1a7a33183309a52abe1522dd7a
BLAKE2b-256 bb9c07b0a365df58a1027ca39c9e5c2423cff8bc14d6f3fb1f4417ddb8b079ac

See more details on using hashes here.

Provenance

The following attestation bundles were made for signwriting_evaluation-0.1.0-cp39-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: release.yaml on sign-language-processing/signwriting-evaluation

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page