Package implementing a revamped method to the librilight-abx.
Project description
libri-light-abx2
The ABX phonetic evaluation metric for unsupervised representation learning as used by the ZeroSpeech challenge, now with context-type options (on-triphone, within-context, any-context). This module is a reworking of https://github.com/zerospeech/libri-light-abx, which in turn is a wrapper around https://github.com/facebookresearch/libri-light/tree/main/eval
Installation
You can install this module from pip directly using the following command :
pip install zerospeech-libriabx2
Or you can install from source by cloning this repository and running:
pip install .
As the final alternative, you can install into a conda environment by running:
conda install -c conda-forge -c pytorch -c coml zerospeech-libriabx2 pytorch::pytorch
Usage
From command line
usage: zrc-abx2 [-h] [--path_checkpoint PATH_CHECKPOINT]
[--file_extension {.pt,.npy,.wav,.flac,.mp3,.npz,.txt}]
[--feature_size FEATURE_SIZE] [--cuda]
[--speaker_mode {all,within,across}]
[--context_mode {all,within,any}]
[--distance_mode {euclidian,euclidean,cosine,kl,kl_symmetric}]
[--max_size_group MAX_SIZE_GROUP]
[--max_x_across MAX_X_ACROSS] [--out OUT] [--seed SEED]
[--pooling {none,mean,hamming}] [--seq_norm]
[--max_size_seq MAX_SIZE_SEQ] [--strict]
path_data path_item_file
ABX metric
positional arguments:
path_data Path to directory containing the submission data
path_item_file Path to the .item file containing the timestamps and
transcriptions
optional arguments:
-h, --help show this help message and exit
--path_checkpoint PATH_CHECKPOINT
Path to a CPC checkpoint. If set, apply the model to
the input data to compute the features
--file_extension {.pt,.npy,.wav,.flac,.mp3,.npz,.txt}
--feature_size FEATURE_SIZE
Size (in s) of one feature
--cuda Use the GPU to compute distances
--speaker_mode {all,within,across}
Choose the speaker mode of the ABX score to compute
--context_mode {all,within,any}
Choose the context mode of the ABX score to compute
--distance_mode {euclidian,euclidean,cosine,kl,kl_symmetric}
Choose the kind of distance to use to compute the ABX
score.
--max_size_group MAX_SIZE_GROUP
Max size of a group while computing the ABX score. A
small value will make the code faster but less
precise.
--max_x_across MAX_X_ACROSS
When computing the ABX across score, maximum number of
speaker X to sample per couple A,B. A small value will
make the code faster but less precise.
--out OUT Path where the results should be saved
--seed SEED Seed to use in random sampling.
--pooling {none,mean,hamming}
Type of pooling over frame representations of items.
--seq_norm Used for CPC features only. If activated, normalize
each batch of feature across the time channel before
computing ABX.
--max_size_seq MAX_SIZE_SEQ
Used for CPC features only. Maximal number of frames
to consider when computing a batch of features.
--strict Used for CPC features only. If activated, each batch
of feature will contain exactly max_size_seq frames.
Python API
You can also call the abx evaluation from python code. You can use the following example:
import zrc_abx2
args = zrc_abx2.EvalArgs(
path_data= "/location/to/representations/",
path_item_file= "/location/to/file.item",
**other_options
)
result = zrc_abx2.EvalABX().eval_abx(args)
Information on evaluation conditions
A new variable in this ABX version is context. In the within-context condition, a, b, and x have the same surrounding context (i.e. the same preceding and following phoneme). any-context ignores the surrounding context; typically, it varies.
For the within-context and any-context comparison, use an item file that extracts phonemes (rather than XYZ triphones). For the on-triphone condition, which is still available, use an item file that extracts triphones (just like in the previous abx evaluation), and then run it within-context (which was the default behavior of the previous abx evaluation). any-context is not used for the on-triphone version due to excessive noise that would be included in the representation.
Like in the previous version, it is also possible to run within-speaker (a, b, x are all from the same speaker) and across-speaker (a and b are from the same speaker, x is from another) evaluations. So there are four phoneme-based evaluation combinations in total: within_s-within_c, within_s-any-c, across_s-within_c, across_s-any_c; and two triphone-based evaluation combinations: within_s-within_c, across_s-within_c.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file zerospeech-libriabx2-0.9.8.tar.gz
.
File metadata
- Download URL: zerospeech-libriabx2-0.9.8.tar.gz
- Upload date:
- Size: 163.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 485ffbd6a227af11c828db701a33096ec87a314fa5911279712396365f087900 |
|
MD5 | c87535edfd899da80f5750a4e0a69c24 |
|
BLAKE2b-256 | e6ca3df0b37b497a33c3f43154e2552007711fed3de479d615440ec6553dbf88 |