A multilingual phoneme recognizer capable of generalizing zero-shot to unseen phoneme inventories.
Project description
Allophant
Allophant is a multilingual phoneme recognizer trained on spoken sentences in 34 languages, capable of generalizing zero-shot to unseen phoneme inventories.
This implementation was utilized in our INTERSPEECH 2023 paper "Allophant: Cross-lingual Phoneme Recognition with Articulatory Attributes" (Citation)
Checkpoints
Pre-trained checkpoints for all evaluated models can be found on Hugging Face:
Model Name | UCLA Phonetic Corpus (PER) | UCLA Phonetic Corpus (AER) | Common Voice (PER) | Common Voice (AER) |
---|---|---|---|---|
Multitask | 45.62% | 19.44% | 34.34% | 8.36% |
Hierarchical | 46.09% | 19.18% | 34.35% | 8.56% |
Multitask Shared | 46.05% | 19.52% | 41.20% | 8.88% |
Baseline Shared | 48.25% | - | 45.35% | - |
Baseline | 57.01% | - | 46.95% | - |
Note that our baseline models were trained without phonetic feature classifiers and therefore only support phoneme recognition.
Result Files
JSON files containing detailed error rates and statistics for all languages can be found in the interspeech_results
directory. Results on the UCLA Phonetic Corpus are stored in files ending in "ucla", while files containing results on the training subset of languages from Mozilla Common Voice end in "commonvoice". See Error Rates for more information.
Installation
System Dependencies
For most Linux and macOS systems, pre-built binaries are available via pip. For installation on other platforms or when building from source, a Rust compiler is required for building the native pyo3
extension. Rustup is recommended for managing Rust installations.
Optional
Torchaudio mp3 support requires ffmpeg to be installed on the system. E.g. for Debian-based Linux distributions:
sudo apt update && sudo apt install ffmpeg
For transcribing training and evaluation data with eSpeak NG G2P, The espeak-ng package is required:
sudo apt install espeak-ng
We transcribed Common Voice using version 1.51 for our paper.
Allophant Package
Allophant can be installed via pip:
pip install allophant
Note that the package currently requires Python >= 3.10 and was tested on 3.12. For use on GPU, torch and torchaudio may need to be manually installed for your required CUDA or ROCm version. (PyTorch installation)
For development, an editable package can be installed as follows:
git clone https://github.com/kgnlp/allophant
cd allophant
pip install -e allophant
Usage
Inference With Pre-trained Models
A pre-trained model can be loaded with the allophant
package from a huggingface checkpoint or local file:
from allophant.estimator import Estimator
device = "cpu"
model, attribute_indexer = Estimator.restore("kgnlp/allophant", device=device)
supported_features = attribute_indexer.feature_names
# The phonetic feature categories supported by the model, including "phonemes"
print(supported_features)
Allophant supports decoding custom phoneme inventories, which can be constructed in multiple ways:
# 1. For a single language:
inventory = attribute_indexer.phoneme_inventory("es")
# 2. For multiple languages, e.g. in code-switching scenarios
inventory = attribute_indexer.phoneme_inventory(["es", "it"])
# 3. Any custom selection of phones for which features are available in the Allophoible database
inventory = ['a', 'ai̯', 'au̯', 'b', 'e', 'eu̯', 'f', 'ɡ', 'l', 'ʎ', 'm', 'ɲ', 'o', 'p', 'ɾ', 's', 't̠ʃ']
Audio files can then be loaded, resampled and transcribed using the given inventory by first computing the log probabilities for each classifier:
import torch
import torchaudio
from allophant.dataset_processing import Batch
# Load an audio file and resample the first channel to the sample rate used by the model
audio, sample_rate = torchaudio.load("utterance.wav")
audio = torchaudio.functional.resample(audio[:1], sample_rate, model.sample_rate)
# Construct a batch of 0-padded single channel audio, lengths and language IDs
# Language ID can be 0 for inference
batch = Batch(audio, torch.tensor([audio.shape[1]]), torch.zeros(1))
model_outputs = model.predict(
batch.to(device),
attribute_indexer.composition_feature_matrix(inventory).to(device)
)
Finally, the log probabilities can be decoded into the recognized phonemes or phonetic features:
from allophant import predictions
# Create a feature mapping for your inventory and CTC decoders for the desired feature set
inventory_indexer = attribute_indexer.attributes.subset(inventory)
ctc_decoders = predictions.feature_decoders(inventory_indexer, feature_names=supported_features)
for feature_name, decoder in ctc_decoders.items():
decoded = decoder(model_outputs.outputs[feature_name].transpose(1, 0), model_outputs.lengths)
# Print the feature name and values for each utterance in the batch
for [hypothesis] in decoded:
# NOTE: token indices are offset by one due to the <BLANK> token used during decoding
recognized = inventory_indexer.feature_values(feature_name, hypothesis.tokens - 1)
print(feature_name, recognized)
Configuration
To specify options for preprocessing, training, and the model architecture, a configuration file in TOML format can be passed to most commands.
For automation purposes, JSON configuration files can be used instead with the --config-json-data/-j
flag.
To start, a default configuration file with comments can be generated as follows:
allophant generate-config [path/to/config]
Preprocessing
The allophant-data
command contains all functionality for corpus processing and management available in allophant
.
For training, corpora without phoneme-level transcriptions have to be transcribed beforehand with a grapheme-to-phoneme model.
Transcription
Phoneme transcriptions for a supported corpus format can be generated with transcribe
.
For instance, for transcribing the German and English subsets of a corpus with eSpeak NG and PHOIBLE features from Allophoible using a batch size of 512 and at most 15,000 utterances per language:
allophant-data transcribe -p -e espeak-ng -b 512 -l de,en -t 15000 -f phoible path/to/corpus -o transcribed_data
Note that no audio data is moved or copied in this process.
All commands that load corpora also accept a path to the *.bin
transcription file directly instead of a directory. This allows loading only specific splits, such as loading only the test
split for evaluation.
Utterance Lengths
As an optional step, utterance lengths can be extracted from a transcribed corpus for more memory efficient batching. If a subset of the corpus was transcribed, lengths will only be stored for the transcribed utterances.
allophant-data save-lengths [-c /path/to/config.toml] path/to/transcribed_corpus path/to/output
Training
During training, the best checkpoint is saved after each evaluation step to the path provided via the --save-path/-s
flag. To save every checkpoint instead, a directory needs to be passed to --save-path/-s
and the --save-all/-a
flag included. The number of worker threads is auto-detected from the number of available CPU threads but can be set manually with -w number
. To train only on the CPU instead of using CUDA, the --cpu
flag can be used. Finally, any progress logging to stderr can be disabled with --no-progress
.
allophant train [-c /path/to/config.toml] [-w number] [--cpu] [--no-progress] [--save-all]
[-s /path/to/checkpoint.pt] [-l /path/to/lengths] path/to/transcribed_corpus
Note that at least the --lengths/-l
flag with a path to previously computed utterance lengths has to be specified when the "frames" batching mode is enabled.
Evaluation
Test Data Inference
For evaluation, test data can be transcribed with the predict
sub-command. The resulting file contains metadata, transcriptions for phonemes and features, and gold standard labels from the test data.
allophant predict [--cpu] [-w number] [-t {ucla-phonetic,common-voice}] [-f phonemes,feature1,feature2]
[--fix-unicode] [--training-languages {include,exclude,only}] [-m {frames,utterances}] [-s number]
[--language-phonemes] [--no-progress] [-c] [-o /path/to/prediction_file.jsonl] /path/to/dataset huggingface/model_id or /path/to/checkpoint
Use --dataset-type/-t
to select the data set type. Note that only Common Voice and the UCLA Phonetic Corpus are currently supported. Predictions will either be printed to stdout or saved to a file given by --output/-o
. Gzip compression is either inferred from a ".jsonl.gz" extension or can be forced with the --compress/-c
flag. The --training-languages
argument allows filtering utterances based on the languages that also occur in the training data, and should be set to "exclude" for zero-shot evaluation.
Using --feature-subset/-f
, a comma separated list of features or "phoneme" such as syllabic,round,phoneme
can be provided to predict only the given subset of classes. With the --fix-unicode
option, predict
attempts to resolve issues of phonemes from the test data missing from the database due to differences in their unicode binary representation.
The batch sizes defined in the model configuration for training can be overwritten with the --batch-size/-s
and --batch-mode/-m
. Note that if the batch mode is set to "utterance" either in the model configuration or by setting the --batch-mode/-m
flag, utterance lengths have to be provided via the --lengths/-l
argument. A beam size can be specified for CTC decoding with beam search (--ctc-beam/-b
). We used a beam size of 1 for greedy decoding in our paper.
Error Rates
The evaluate
sub-command computes edit statistics and phoneme and attribute error rates for each language of a given corpus or split.
allophant evaluate [--fix-unicode] [--no-remap] [--split-complex] [-j] [--no-progress]
[-o path/to/results.json] [-d] path/to/predictions.jsonl
Without --no-remap
, transcriptions are mapped to language inventories using the same mapping scheme used during training. In our paper, all results were computed without this mapping, meaning that the transcriptions were directly compared to labels without an additional mapping step. If --fix-unicode
was used during prediction, it should also be used in evaluate
. Evaluation supports splitting any complex phoneme segments before computing error statistics with the --split-complex/-s
flag.
For further analysis of evaluation results, JSON output should be enabled via the --json/-j
flag. The JSON file can then be read using allophant.evaluation.EvaluationResults
. For quick inspection of human-readable (average) error rates from evaluation results saved in JSON format, use allophant-error-rates
:
allophant-error-rates path/to/results_file
Allophoible Inventories
Inventories and feature sets preprocessed from (a subset of) Allophoible for training can be extracted with the allophant-features
command.
allophant-features [-p /path/to/allophoible.csv] [--remove-zero] [--prefer-allophant-dialects]
[-o /path/to/output.csv] [en,fr,ko,ar,...]
Citation
When using our work, please cite our paper as follows:
@inproceedings{glocker2023allophant,
title={Allophant: Cross-lingual Phoneme Recognition with Articulatory Attributes},
author={Glocker, Kevin and Herygers, Aaricia and Georges, Munir},
year={2023},
booktitle={{Proc. Interspeech 2023}},
month={8}}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file allophant-1.0.0.tar.gz
.
File metadata
- Download URL: allophant-1.0.0.tar.gz
- Upload date:
- Size: 4.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | df900492ae3a816c2e7f50b6b4a6ebb306f84a7d733bc234d5217425137e8bd6 |
|
MD5 | f5aa3537e23f48633a64622435f7f5e4 |
|
BLAKE2b-256 | 0cb7d11f46871ab6124e5ff294a65132068cac5c816bd5221c042092ed29cdb1 |
File details
Details for the file allophant-1.0.0-cp312-cp312-manylinux_2_34_x86_64.whl
.
File metadata
- Download URL: allophant-1.0.0-cp312-cp312-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 2.4 MB
- Tags: CPython 3.12, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4a9a04f8f4fab8c928e22e66ba2857b68b8bef5dabcacebdc47aa29210d25641 |
|
MD5 | 4c2b1184c842f9dc0ebd924b5c07959b |
|
BLAKE2b-256 | 7a421a07141c4d7bd2d1cdb86c83e54f89f14bea1e6e2f7b0e1269064b801190 |
File details
Details for the file allophant-1.0.0-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl
.
File metadata
- Download URL: allophant-1.0.0-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl
- Upload date:
- Size: 2.5 MB
- Tags: CPython 3.12, manylinux: glibc 2.17+ i686
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 75996b10f16d1020b17bfd9e4fed35e755ccbb6b5c4e70a4490c4d7f85336c1e |
|
MD5 | d65b88ae5b788e19834b7872b063b87c |
|
BLAKE2b-256 | 633bc5c638d305f8c03e690f581f5107606dd020dd95059b0119d96d99f25803 |
File details
Details for the file allophant-1.0.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
.
File metadata
- Download URL: allophant-1.0.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 2.5 MB
- Tags: CPython 3.12, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 40c39d3f2173d4101caeeaf5a15c3a07fd240cd6d00d791de9ca22ebba4231a2 |
|
MD5 | 322b7d741d49af4c86dea4f6716b7e11 |
|
BLAKE2b-256 | 5dc5d1859425f8bfec5e9aebc0ec8a97616ea5e5d0dee5ac9b5b6a9409c75b27 |
File details
Details for the file allophant-1.0.0-cp312-cp312-macosx_11_0_arm64.whl
.
File metadata
- Download URL: allophant-1.0.0-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 2.0 MB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1e98293ab3171df6b4accc66be8d9b3e24ad16598508a3208c9621540ce0ac6d |
|
MD5 | 6db00903194b8ad695707f39cb2e2ec2 |
|
BLAKE2b-256 | 432f33d555264a0013d46fbdf5532f8ecbf7440a36ed7689dfe6f3095c8ff6ff |
File details
Details for the file allophant-1.0.0-cp312-cp312-macosx_10_12_x86_64.whl
.
File metadata
- Download URL: allophant-1.0.0-cp312-cp312-macosx_10_12_x86_64.whl
- Upload date:
- Size: 2.0 MB
- Tags: CPython 3.12, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8738a024793a3e540524db3f8436aeb72333b7663085ab56a95d904d2011fea5 |
|
MD5 | 41a2158a440531a2436f7173adb7b843 |
|
BLAKE2b-256 | 5a623149f6bd1cdf1bd96e000ff06a0cb6a88b0f29bd6f7169382b80de587336 |
File details
Details for the file allophant-1.0.0-cp311-cp311-manylinux_2_34_x86_64.whl
.
File metadata
- Download URL: allophant-1.0.0-cp311-cp311-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 2.4 MB
- Tags: CPython 3.11, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 60c6abb0eed0bbf5c3810a4da486a9aba62f937af9af7c13334d478e56cf498b |
|
MD5 | dce5f8763a0b6dc4075aab508a15a35c |
|
BLAKE2b-256 | f3770c31f48862d28bb795c1011bc2338e95ab682450309520132661dc7d3900 |
File details
Details for the file allophant-1.0.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl
.
File metadata
- Download URL: allophant-1.0.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl
- Upload date:
- Size: 2.5 MB
- Tags: CPython 3.11, manylinux: glibc 2.17+ i686
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 04982a5b5d3855beeb39416942dc31a93cf939c1996258723f2f69bea460e90d |
|
MD5 | b48bdbd472efbeff9beb9f8410d75007 |
|
BLAKE2b-256 | 6e76a8158f25d9ddc11c826a3af39be0f29e59244d2a604b4a17e82b7626ec8d |
File details
Details for the file allophant-1.0.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
.
File metadata
- Download URL: allophant-1.0.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 2.5 MB
- Tags: CPython 3.11, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 39abc4f501ff8a223df78dafed1f12d177a42b0b335f4102913b80a413e4e096 |
|
MD5 | 8e464922d4325572b4a8cb44ac4fd43c |
|
BLAKE2b-256 | d21b44d64af44f1aaad32b10fcb9202df1e871101fea080b9e4122d3ebf50456 |
File details
Details for the file allophant-1.0.0-cp311-cp311-macosx_11_0_arm64.whl
.
File metadata
- Download URL: allophant-1.0.0-cp311-cp311-macosx_11_0_arm64.whl
- Upload date:
- Size: 2.0 MB
- Tags: CPython 3.11, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | aebe45bb420aa73fb7b987f1ba48c11f66e342dc346f8b226824448f88cc974c |
|
MD5 | a56bbd3b3b240918ec2a7df3eccdcd7a |
|
BLAKE2b-256 | 3a8dd9b37869334e1cce5afe1920c46b40cac2484172b1dd98a75a9330b6baed |
File details
Details for the file allophant-1.0.0-cp311-cp311-macosx_10_12_x86_64.whl
.
File metadata
- Download URL: allophant-1.0.0-cp311-cp311-macosx_10_12_x86_64.whl
- Upload date:
- Size: 2.0 MB
- Tags: CPython 3.11, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fd4af90c43849a9188b67cb71f279dcd40275e7b8d18054adad30158246e2467 |
|
MD5 | 8d8a29d9b945c4423ab8315fc75c88a9 |
|
BLAKE2b-256 | ae8ed62c7d95538138fb99a1e0dfbf45b085d38a4f3efa699ec0cbdd046b439f |
File details
Details for the file allophant-1.0.0-cp310-cp310-manylinux_2_34_x86_64.whl
.
File metadata
- Download URL: allophant-1.0.0-cp310-cp310-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 2.4 MB
- Tags: CPython 3.10, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6c6f20ec174afa0a7af7ee114ea56c39626dbcb0c861607838de9f313cd0da47 |
|
MD5 | e6bbb49692a8778175029186423e1606 |
|
BLAKE2b-256 | 05625ea3fa8568c93f67f11c4de986e2364d5833953b067ea0640fb58490a5c1 |
File details
Details for the file allophant-1.0.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl
.
File metadata
- Download URL: allophant-1.0.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl
- Upload date:
- Size: 2.5 MB
- Tags: CPython 3.10, manylinux: glibc 2.17+ i686
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ecaa3b85aee1daa1a8aac19e16ab29311109bf9a7a55b6661933cd6ef378c61f |
|
MD5 | 0402129d08af83d8d592c2aba70b6eaa |
|
BLAKE2b-256 | a9eb173c1729c6452e1a198850ed6dce687d191b4edd960a2bb03f08f7db8546 |
File details
Details for the file allophant-1.0.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
.
File metadata
- Download URL: allophant-1.0.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 2.5 MB
- Tags: CPython 3.10, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b8076ff9b00382279fe17cd6addd25efe21ad5c0b480b557b7d1bd4cfffd52ef |
|
MD5 | c1a90e57238670dc93fd823b5b8e6873 |
|
BLAKE2b-256 | d4bdb4d5ed8ddbfec312399be805836b410ef78c9b1b66eaaf0daf26791f8bff |
File details
Details for the file allophant-1.0.0-cp310-cp310-macosx_11_0_arm64.whl
.
File metadata
- Download URL: allophant-1.0.0-cp310-cp310-macosx_11_0_arm64.whl
- Upload date:
- Size: 2.0 MB
- Tags: CPython 3.10, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 37bc47ef711c117cf3585103292d443da0ee51861675a47ccde10970f5eac398 |
|
MD5 | fdf01ba8200f872e665b7bc38a6c3c50 |
|
BLAKE2b-256 | d28d5ec649aeb2baa74f93db4c7bf6e786de1126a6d43d8fb1042c6f81699fce |
File details
Details for the file allophant-1.0.0-cp310-cp310-macosx_10_12_x86_64.whl
.
File metadata
- Download URL: allophant-1.0.0-cp310-cp310-macosx_10_12_x86_64.whl
- Upload date:
- Size: 2.0 MB
- Tags: CPython 3.10, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0b3a95b6218a9bf85d4e38ee7587ad1d38f05dbc4ebe0200cf2b851a0f69d341 |
|
MD5 | 818c07490c1c6e00e53dc49709dd00c4 |
|
BLAKE2b-256 | dc8eecb72122b4ff541b5ff7c753c8216761b130f2d7cfb19fa79ec99bebef24 |