Automated Audio Captioning metrics with Pytorch.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Audio Captioning metrics (aac-metrics)

Audio Captioning metrics source code, designed for Pytorch.

This package is a tool to evaluate sentences produced by automatic models to caption image or audio. The results of BLEU [1], ROUGE-L [2], METEOR [3], CIDEr [4], SPICE [5] and SPIDEr [6] are consistents with https://github.com/audio-captioning/caption-evaluation-tools.

Installation

Install the pip package:

pip install https://github.com/Labbeti/aac-metrics

Download the external code needed for METEOR, SPICE and PTBTokenizer:

aac-metrics-download

Examples

Evaluate all metrics

from aac_metrics import aac_evaluate

candidates = ["a man is speaking", ...]
mult_references = [["a man speaks.", "someone speaks.", "a man is speaking while a bird is chirping in the background"], ...]

global_scores, _ = aac_evaluate(candidates, mult_references)
print(global_scores)
# dict containing the score of each aac metric: "bleu_1", "bleu_2", "bleu_3", "bleu_4", "rouge_l", "meteor", "cider_d", "spice", "spider"
# {"bleu_1": tensor(0.7), "bleu_2": ..., ...}

Evaluate a specific metric

from aac_metrics.functional import coco_cider_d

candidates = [...]
mult_references = [[...], ...]

global_scores, local_scores = coco_cider_d(candidates, mult_references)
print(global_scores)
# {"cider_d": tensor(0.1)}
print(local_scores)
# {"cider_d": tensor([0.9, ...])}

Experimental SPIDEr-max metric

from aac_metrics.functional import spider_max

mult_candidates = [[...], ...]
mult_references = [[...], ...]

global_scores, local_scores = spider_max(mult_candidates, mult_references)
print(global_scores)
# {"spider": tensor(0.1)}
print(local_scores)
# {"spider": tensor([0.9, ...])}

Requirements

Python packages

The requirements are automatically installed when using pip install on this repository.

torch >= 1.10.1
numpy >= 1.21.2
pyyaml >= 6.0
tqdm >= 4.64.0

External requirements

java >= 1.8 is required to compute METEOR, SPICE and use the PTBTokenizer. Most of these functions can specify a java executable path with java_path argument.
unzip command to extract SPICE zipped files.

Metrics

Coco metrics

Metric	Origin	Range	Short description
BLEU [1]	machine translation	[0, 1]	Precision of n-grams
ROUGE-L [2]	machine translation	[0, 1]	Longest common subsequence
METEOR [3]	machine translation	[0, 1]	Cosine-similarity of frequencies
CIDEr [4]	image captioning	[0, 10]	Cosine-similarity of TF-IDF
SPICE [5]	image captioning	[0, 1]	FScore of semantic graph
SPIDEr [6]	image captioning	[0, 5.5]	Mean of CIDEr and SPICE

Other metrics

Metric	Origin	Range	Short description
SPIDEr-max	audio captioning	[0, 5.5]	Max of multiples candidates SPIDEr scores

References

[1] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “BLEU: a method for automatic evaluation of machine translation,” in Proceed- ings of the 40th Annual Meeting on Association for Computational Linguistics - ACL ’02. Philadelphia, Pennsylvania: Association for Computational Linguistics, 2001, p. 311. [Online]. Available: http://portal.acm.org/citation.cfm?doid=1073083.1073135

[2] C.-Y. Lin, “ROUGE: A package for automatic evaluation of summaries,” in Text Summarization Branches Out. Barcelona, Spain: Association for Computational Linguistics, Jul. 2004, pp. 74–81. [Online]. Available: https://aclanthology.org/W04-1013

[3] M. Denkowski and A. Lavie, “Meteor Universal: Language Specific Translation Evaluation for Any Target Language,” in Proceedings of the Ninth Workshop on Statistical Machine Translation. Baltimore, Maryland, USA: Association for Computational Linguistics, 2014, pp. 376–380. [Online]. Available: http://aclweb.org/anthology/W14-3348

[4] R. Vedantam, C. L. Zitnick, and D. Parikh, “CIDEr: Consensus-based Image Description Evaluation,” arXiv:1411.5726 [cs], Jun. 2015, arXiv: 1411.5726. [Online]. Available: http://arxiv.org/abs/1411.5726

[5] P. Anderson, B. Fernando, M. Johnson, and S. Gould, “SPICE: Semantic Propositional Image Caption Evaluation,” arXiv:1607.08822 [cs], Jul. 2016, arXiv: 1607.08822. [Online]. Available: http://arxiv.org/abs/1607.08822

[6] S. Liu, Z. Zhu, N. Ye, S. Guadarrama, and K. Murphy, “Improved Image Captioning via Policy Gradient optimization of SPIDEr,” 2017 IEEE Inter- national Conference on Computer Vision (ICCV), pp. 873–881, Oct. 2017, arXiv: 1612.00370. [Online]. Available: http://arxiv.org/abs/1612.00370

Cite the aac-metrics package

The associated paper has been accepted but it will be published after the DCASE2022 workshop.

If you use this code, you can cite with the following temporary citation:

@inproceedings{Labbe2022,
    author = "Etienne Labbe, Thomas Pellegrini, Julien Pinquier",
    title = "IS MY AUTOMATIC AUDIO CAPTIONING SYSTEM SO BAD? SPIDEr-max: A METRIC TO CONSIDER SEVERAL CAPTION CANDIDATES",
    month = "November",
    year = "2022",
}

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.5.4

Mar 5, 2024

0.5.3

Jan 9, 2024

0.5.2

Jan 5, 2024

0.5.1

Dec 20, 2023

0.5.0

Dec 8, 2023

0.4.6

Oct 10, 2023

0.4.5

Sep 12, 2023

0.4.4

Aug 14, 2023

0.4.3

Jul 25, 2023

0.4.2

Apr 19, 2023

0.4.1

Apr 13, 2023

0.4.0

Apr 13, 2023

0.3.0

Feb 27, 2023

0.2.0

Dec 14, 2022

0.1.2

Oct 31, 2022

0.1.1

Sep 30, 2022

This version

0.1.0

Sep 28, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aac-metrics-0.1.0.tar.gz (45.8 kB view hashes)

Uploaded Sep 28, 2022 Source

Hashes for aac-metrics-0.1.0.tar.gz

Hashes for aac-metrics-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`d9bfb93d9cb0fd25fc2d493bd0feca9844ade837b240243a0d26fba5143c3825`
MD5	`c66c521f66ab58d259dbedb38286fb03`
BLAKE2b-256	`9a6f0a87a06ce96b54a12d555e1069a3c854c05df83f7e2db594106e6bdfae19`