Skip to main content

A multilingual phoneme recognizer capable of generalizing zero-shot to unseen phoneme inventories.

Project description

Allophant

Allophant is a multilingual phoneme recognizer trained on spoken sentences in 34 languages, capable of generalizing zero-shot to unseen phoneme inventories.

This implementation was utilized in our INTERSPEECH 2023 paper "Allophant: Cross-lingual Phoneme Recognition with Articulatory Attributes" (Citation)

Checkpoints

Pre-trained checkpoints for all evaluated models can be found on Hugging Face:

Model Name UCLA Phonetic Corpus (PER) UCLA Phonetic Corpus (AER) Common Voice (PER) Common Voice (AER)
Multitask 45.62% 19.44% 34.34% 8.36%
Hierarchical 46.09% 19.18% 34.35% 8.56%
Multitask Shared 46.05% 19.52% 41.20% 8.88%
Baseline Shared 48.25% - 45.35% -
Baseline 57.01% - 46.95% -

Note that our baseline models were trained without phonetic feature classifiers and therefore only support phoneme recognition.

Result Files

JSON files containing detailed error rates and statistics for all languages can be found in the interspeech_results directory. Results on the UCLA Phonetic Corpus are stored in files ending in "ucla", while files containing results on the training subset of languages from Mozilla Common Voice end in "commonvoice". See Error Rates for more information.

Installation

System Dependencies

For most Linux and macOS systems, pre-built binaries are available via pip. For installation on other platforms or when building from source, a Rust compiler is required for building the native pyo3 extension. Rustup is recommended for managing Rust installations.

Optional

Torchaudio mp3 support requires ffmpeg to be installed on the system. E.g. for Debian-based Linux distributions:

sudo apt update && sudo apt install ffmpeg

For transcribing training and evaluation data with eSpeak NG G2P, The espeak-ng package is required:

sudo apt install espeak-ng

We transcribed Common Voice using version 1.51 for our paper.

Allophant Package

Allophant can be installed via pip:

pip install allophant

Note that the package currently requires Python >= 3.10 and was tested on 3.12. For use on GPU, torch and torchaudio may need to be manually installed for your required CUDA or ROCm version. (PyTorch installation)

For development, an editable package can be installed as follows:

git clone https://github.com/kgnlp/allophant
cd allophant
pip install -e allophant

Usage

Inference With Pre-trained Models

A pre-trained model can be loaded with the allophant package from a huggingface checkpoint or local file:

from allophant.estimator import Estimator

device = "cpu"
model, attribute_indexer = Estimator.restore("kgnlp/allophant", device=device)
supported_features = attribute_indexer.feature_names
# The phonetic feature categories supported by the model, including "phonemes"
print(supported_features)

Allophant supports decoding custom phoneme inventories, which can be constructed in multiple ways:

# 1. For a single language:
inventory = attribute_indexer.phoneme_inventory("es")
# 2. For multiple languages, e.g. in code-switching scenarios
inventory = attribute_indexer.phoneme_inventory(["es", "it"])
# 3. Any custom selection of phones for which features are available in the Allophoible database
inventory = ['a', 'ai̯', 'au̯', 'b', 'e', 'eu̯', 'f', 'ɡ', 'l', 'ʎ', 'm', 'ɲ', 'o', 'p', 'ɾ', 's', 't̠ʃ']

Audio files can then be loaded, resampled and transcribed using the given inventory by first computing the log probabilities for each classifier:

import torch
import torchaudio
from allophant.dataset_processing import Batch

# Load an audio file and resample the first channel to the sample rate used by the model
audio, sample_rate = torchaudio.load("utterance.wav")
audio = torchaudio.functional.resample(audio[:1], sample_rate, model.sample_rate)

# Construct a batch of 0-padded single channel audio, lengths and language IDs
# Language ID can be 0 for inference
batch = Batch(audio, torch.tensor([audio.shape[1]]), torch.zeros(1))
model_outputs = model.predict(
  batch.to(device),
  attribute_indexer.composition_feature_matrix(inventory).to(device)
)

Finally, the log probabilities can be decoded into the recognized phonemes or phonetic features:

from allophant import predictions

# Create a feature mapping for your inventory and CTC decoders for the desired feature set
inventory_indexer = attribute_indexer.attributes.subset(inventory)
ctc_decoders = predictions.feature_decoders(inventory_indexer, feature_names=supported_features)

for feature_name, decoder in ctc_decoders.items():
    decoded = decoder(model_outputs.outputs[feature_name].transpose(1, 0), model_outputs.lengths)
    # Print the feature name and values for each utterance in the batch
    for [hypothesis] in decoded:
        # NOTE: token indices are offset by one due to the <BLANK> token used during decoding
        recognized = inventory_indexer.feature_values(feature_name, hypothesis.tokens - 1)
        print(feature_name, recognized)

Configuration

To specify options for preprocessing, training, and the model architecture, a configuration file in TOML format can be passed to most commands. For automation purposes, JSON configuration files can be used instead with the --config-json-data/-j flag. To start, a default configuration file with comments can be generated as follows:

allophant generate-config [path/to/config]

Preprocessing

The allophant-data command contains all functionality for corpus processing and management available in allophant. For training, corpora without phoneme-level transcriptions have to be transcribed beforehand with a grapheme-to-phoneme model.

Transcription

Phoneme transcriptions for a supported corpus format can be generated with transcribe. For instance, for transcribing the German and English subsets of a corpus with eSpeak NG and PHOIBLE features from Allophoible using a batch size of 512 and at most 15,000 utterances per language:

allophant-data transcribe -p -e espeak-ng -b 512 -l de,en -t 15000 -f phoible path/to/corpus -o transcribed_data

Note that no audio data is moved or copied in this process. All commands that load corpora also accept a path to the *.bin transcription file directly instead of a directory. This allows loading only specific splits, such as loading only the test split for evaluation.

Utterance Lengths

As an optional step, utterance lengths can be extracted from a transcribed corpus for more memory efficient batching. If a subset of the corpus was transcribed, lengths will only be stored for the transcribed utterances.

allophant-data save-lengths [-c /path/to/config.toml] path/to/transcribed_corpus path/to/output

Training

During training, the best checkpoint is saved after each evaluation step to the path provided via the --save-path/-s flag. To save every checkpoint instead, a directory needs to be passed to --save-path/-s and the --save-all/-a flag included. The number of worker threads is auto-detected from the number of available CPU threads but can be set manually with -w number. To train only on the CPU instead of using CUDA, the --cpu flag can be used. Finally, any progress logging to stderr can be disabled with --no-progress.

allophant train [-c /path/to/config.toml] [-w number] [--cpu] [--no-progress] [--save-all]
  [-s /path/to/checkpoint.pt] [-l /path/to/lengths] path/to/transcribed_corpus

Note that at least the --lengths/-l flag with a path to previously computed utterance lengths has to be specified when the "frames" batching mode is enabled.

Evaluation

Test Data Inference

For evaluation, test data can be transcribed with the predict sub-command. The resulting file contains metadata, transcriptions for phonemes and features, and gold standard labels from the test data.

allophant predict [--cpu] [-w number] [-t {ucla-phonetic,common-voice}] [-f phonemes,feature1,feature2]
  [--fix-unicode] [--training-languages {include,exclude,only}] [-m {frames,utterances}] [-s number]
  [--language-phonemes] [--no-progress] [-c] [-o /path/to/prediction_file.jsonl] /path/to/dataset huggingface/model_id or /path/to/checkpoint

Use --dataset-type/-t to select the data set type. Note that only Common Voice and the UCLA Phonetic Corpus are currently supported. Predictions will either be printed to stdout or saved to a file given by --output/-o. Gzip compression is either inferred from a ".jsonl.gz" extension or can be forced with the --compress/-c flag. The --training-languages argument allows filtering utterances based on the languages that also occur in the training data, and should be set to "exclude" for zero-shot evaluation.

Using --feature-subset/-f, a comma separated list of features or "phoneme" such as syllabic,round,phoneme can be provided to predict only the given subset of classes. With the --fix-unicode option, predict attempts to resolve issues of phonemes from the test data missing from the database due to differences in their unicode binary representation.

The batch sizes defined in the model configuration for training can be overwritten with the --batch-size/-s and --batch-mode/-m. Note that if the batch mode is set to "utterance" either in the model configuration or by setting the --batch-mode/-m flag, utterance lengths have to be provided via the --lengths/-l argument. A beam size can be specified for CTC decoding with beam search (--ctc-beam/-b). We used a beam size of 1 for greedy decoding in our paper.

Error Rates

The evaluate sub-command computes edit statistics and phoneme and attribute error rates for each language of a given corpus or split.

allophant evaluate [--fix-unicode] [--no-remap] [--split-complex] [-j] [--no-progress]
  [-o path/to/results.json] [-d] path/to/predictions.jsonl

Without --no-remap, transcriptions are mapped to language inventories using the same mapping scheme used during training. In our paper, all results were computed without this mapping, meaning that the transcriptions were directly compared to labels without an additional mapping step. If --fix-unicode was used during prediction, it should also be used in evaluate. Evaluation supports splitting any complex phoneme segments before computing error statistics with the --split-complex/-s flag.

For further analysis of evaluation results, JSON output should be enabled via the --json/-j flag. The JSON file can then be read using allophant.evaluation.EvaluationResults. For quick inspection of human-readable (average) error rates from evaluation results saved in JSON format, use allophant-error-rates:

allophant-error-rates path/to/results_file

Allophoible Inventories

Inventories and feature sets preprocessed from (a subset of) Allophoible for training can be extracted with the allophant-features command.

allophant-features [-p /path/to/allophoible.csv] [--remove-zero] [--prefer-allophant-dialects]
  [-o /path/to/output.csv] [en,fr,ko,ar,...]

Citation

When using our work, please cite our paper as follows:

@inproceedings{glocker2023allophant,
    title={Allophant: Cross-lingual Phoneme Recognition with Articulatory Attributes},
    author={Glocker, Kevin and Herygers, Aaricia and Georges, Munir},
    year={2023},
    booktitle={{Proc. Interspeech 2023}},
    month={8}}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

allophant-1.0.0.tar.gz (4.5 MB view details)

Uploaded Source

Built Distributions

allophant-1.0.0-cp312-cp312-manylinux_2_34_x86_64.whl (2.4 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.34+ x86-64

allophant-1.0.0-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl (2.5 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ i686

allophant-1.0.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.5 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64

allophant-1.0.0-cp312-cp312-macosx_11_0_arm64.whl (2.0 MB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

allophant-1.0.0-cp312-cp312-macosx_10_12_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.12 macOS 10.12+ x86-64

allophant-1.0.0-cp311-cp311-manylinux_2_34_x86_64.whl (2.4 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.34+ x86-64

allophant-1.0.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl (2.5 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ i686

allophant-1.0.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.5 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

allophant-1.0.0-cp311-cp311-macosx_11_0_arm64.whl (2.0 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

allophant-1.0.0-cp311-cp311-macosx_10_12_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.11 macOS 10.12+ x86-64

allophant-1.0.0-cp310-cp310-manylinux_2_34_x86_64.whl (2.4 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.34+ x86-64

allophant-1.0.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl (2.5 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ i686

allophant-1.0.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.5 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

allophant-1.0.0-cp310-cp310-macosx_11_0_arm64.whl (2.0 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

allophant-1.0.0-cp310-cp310-macosx_10_12_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.10 macOS 10.12+ x86-64

File details

Details for the file allophant-1.0.0.tar.gz.

File metadata

  • Download URL: allophant-1.0.0.tar.gz
  • Upload date:
  • Size: 4.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for allophant-1.0.0.tar.gz
Algorithm Hash digest
SHA256 df900492ae3a816c2e7f50b6b4a6ebb306f84a7d733bc234d5217425137e8bd6
MD5 f5aa3537e23f48633a64622435f7f5e4
BLAKE2b-256 0cb7d11f46871ab6124e5ff294a65132068cac5c816bd5221c042092ed29cdb1

See more details on using hashes here.

File details

Details for the file allophant-1.0.0-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for allophant-1.0.0-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 4a9a04f8f4fab8c928e22e66ba2857b68b8bef5dabcacebdc47aa29210d25641
MD5 4c2b1184c842f9dc0ebd924b5c07959b
BLAKE2b-256 7a421a07141c4d7bd2d1cdb86c83e54f89f14bea1e6e2f7b0e1269064b801190

See more details on using hashes here.

File details

Details for the file allophant-1.0.0-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for allophant-1.0.0-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 75996b10f16d1020b17bfd9e4fed35e755ccbb6b5c4e70a4490c4d7f85336c1e
MD5 d65b88ae5b788e19834b7872b063b87c
BLAKE2b-256 633bc5c638d305f8c03e690f581f5107606dd020dd95059b0119d96d99f25803

See more details on using hashes here.

File details

Details for the file allophant-1.0.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for allophant-1.0.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 40c39d3f2173d4101caeeaf5a15c3a07fd240cd6d00d791de9ca22ebba4231a2
MD5 322b7d741d49af4c86dea4f6716b7e11
BLAKE2b-256 5dc5d1859425f8bfec5e9aebc0ec8a97616ea5e5d0dee5ac9b5b6a9409c75b27

See more details on using hashes here.

File details

Details for the file allophant-1.0.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for allophant-1.0.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1e98293ab3171df6b4accc66be8d9b3e24ad16598508a3208c9621540ce0ac6d
MD5 6db00903194b8ad695707f39cb2e2ec2
BLAKE2b-256 432f33d555264a0013d46fbdf5532f8ecbf7440a36ed7689dfe6f3095c8ff6ff

See more details on using hashes here.

File details

Details for the file allophant-1.0.0-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for allophant-1.0.0-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 8738a024793a3e540524db3f8436aeb72333b7663085ab56a95d904d2011fea5
MD5 41a2158a440531a2436f7173adb7b843
BLAKE2b-256 5a623149f6bd1cdf1bd96e000ff06a0cb6a88b0f29bd6f7169382b80de587336

See more details on using hashes here.

File details

Details for the file allophant-1.0.0-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for allophant-1.0.0-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 60c6abb0eed0bbf5c3810a4da486a9aba62f937af9af7c13334d478e56cf498b
MD5 dce5f8763a0b6dc4075aab508a15a35c
BLAKE2b-256 f3770c31f48862d28bb795c1011bc2338e95ab682450309520132661dc7d3900

See more details on using hashes here.

File details

Details for the file allophant-1.0.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for allophant-1.0.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 04982a5b5d3855beeb39416942dc31a93cf939c1996258723f2f69bea460e90d
MD5 b48bdbd472efbeff9beb9f8410d75007
BLAKE2b-256 6e76a8158f25d9ddc11c826a3af39be0f29e59244d2a604b4a17e82b7626ec8d

See more details on using hashes here.

File details

Details for the file allophant-1.0.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for allophant-1.0.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 39abc4f501ff8a223df78dafed1f12d177a42b0b335f4102913b80a413e4e096
MD5 8e464922d4325572b4a8cb44ac4fd43c
BLAKE2b-256 d21b44d64af44f1aaad32b10fcb9202df1e871101fea080b9e4122d3ebf50456

See more details on using hashes here.

File details

Details for the file allophant-1.0.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for allophant-1.0.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 aebe45bb420aa73fb7b987f1ba48c11f66e342dc346f8b226824448f88cc974c
MD5 a56bbd3b3b240918ec2a7df3eccdcd7a
BLAKE2b-256 3a8dd9b37869334e1cce5afe1920c46b40cac2484172b1dd98a75a9330b6baed

See more details on using hashes here.

File details

Details for the file allophant-1.0.0-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for allophant-1.0.0-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 fd4af90c43849a9188b67cb71f279dcd40275e7b8d18054adad30158246e2467
MD5 8d8a29d9b945c4423ab8315fc75c88a9
BLAKE2b-256 ae8ed62c7d95538138fb99a1e0dfbf45b085d38a4f3efa699ec0cbdd046b439f

See more details on using hashes here.

File details

Details for the file allophant-1.0.0-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for allophant-1.0.0-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 6c6f20ec174afa0a7af7ee114ea56c39626dbcb0c861607838de9f313cd0da47
MD5 e6bbb49692a8778175029186423e1606
BLAKE2b-256 05625ea3fa8568c93f67f11c4de986e2364d5833953b067ea0640fb58490a5c1

See more details on using hashes here.

File details

Details for the file allophant-1.0.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for allophant-1.0.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 ecaa3b85aee1daa1a8aac19e16ab29311109bf9a7a55b6661933cd6ef378c61f
MD5 0402129d08af83d8d592c2aba70b6eaa
BLAKE2b-256 a9eb173c1729c6452e1a198850ed6dce687d191b4edd960a2bb03f08f7db8546

See more details on using hashes here.

File details

Details for the file allophant-1.0.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for allophant-1.0.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 b8076ff9b00382279fe17cd6addd25efe21ad5c0b480b557b7d1bd4cfffd52ef
MD5 c1a90e57238670dc93fd823b5b8e6873
BLAKE2b-256 d4bdb4d5ed8ddbfec312399be805836b410ef78c9b1b66eaaf0daf26791f8bff

See more details on using hashes here.

File details

Details for the file allophant-1.0.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for allophant-1.0.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 37bc47ef711c117cf3585103292d443da0ee51861675a47ccde10970f5eac398
MD5 fdf01ba8200f872e665b7bc38a6c3c50
BLAKE2b-256 d28d5ec649aeb2baa74f93db4c7bf6e786de1126a6d43d8fb1042c6f81699fce

See more details on using hashes here.

File details

Details for the file allophant-1.0.0-cp310-cp310-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for allophant-1.0.0-cp310-cp310-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 0b3a95b6218a9bf85d4e38ee7587ad1d38f05dbc4ebe0200cf2b851a0f69d341
MD5 818c07490c1c6e00e53dc49709dd00c4
BLAKE2b-256 dc8eecb72122b4ff541b5ff7c753c8216761b130f2d7cfb19fa79ec99bebef24

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page