Skip to main content

Plug-and-Play Multilingual Few-shot Spoken Words Recognition

Project description

Plug-and-Play Multilingual Few-shot Spoken Words Recognition

Abstract

As technology advances and digital devices become prevalent, seamless human-machine communication is increasingly gaining significance. The growing adoption of mobile, wearable, and other Internet of Things (IoT) devices has changed how we interact with these smart devices, making accurate spoken words recognition a crucial component for effective interaction. However, building robust spoken words detection system that can handle novel keywords remains challenging, especially for low-resource languages with limited training data. Here, we propose PLiX, a general-purpose, multilingual, and plug-and-play keyword spotting system that leverages few-shot learning to harness massive real-world data and enable the recognition of unseen spoken words at test-time. Our few-shot deep models are learned with millions of one-second audio clips across 20 languages, achieving state-of-the-art performance while being highly efficient. Extensive evaluations show that PLiX can generalize to novel spoken words given as few as just one support example and performs well on unseen languages out of the box. We release models and inference code to serve as a foundation for future research and voice-enabled user interface development for emerging devices.

Key Contributions

  • We develop PLiX, a general-purpose, multilingual, and plug-and-play, few-shot keyword spotting system trained and evaluated with more than 12 million one-second audio clips sampled at 16kHz.
  • Leverage state-of-the-art neural architectures to learn few-shot models that are high performant while being efficient with fewer learnable parameters.
  • A wide-ranging set of evaluations to systematically quantify the efficacy of our system across 20 languages and thousands of classes (i.e., words or terms); showcasing generalization to unseen words at test-time given as few as one support example per class.
  • We demonstrate that our model generalizes exceptionally well in a one-shot setting on 5 unseen languages. Further, in a cross-task transfer evaluation on a challenging FLEURS benchmark, our model performs well for language identification without any retraining.
  • To serve as a building block for future research on spoken word detection with meta-learning and enable product development, we release model weights and inference code as a Python package.

Quick Start

We provide the library for our PLiX model:

pip install plixkws

Then you can follow the below usage or refer to test_model.py.

import torch
from plixkws import util
from plixkws import model

fws_model = model.load(encoder_name="base", language="en", device="cpu")

support = {
    "paths": ["./test_clips/aandachtig.wav", "./test_clips/stroom.wav",
        "./test_clips/persbericht.wav", "./test_clips/klinkers.wav",
        "./test_clips/zinsbouw.wav"],
    "labels": torch.tensor([0,1,2,3,4]),
    "classes": ["aandachtig", "stroom", "persbericht", "klinkers", "zinsbouw"],
}
support["audio"] = torch.stack([util.load_clip(path) for path in support["paths"]])
support = util.batch_device(support, device="cpu")

query = {
    "paths": ["./test_clips/query_klinkers.wav", "./test_clips/query_stroom.wav"]
}
query["audio"] = torch.stack([util.load_clip(path) for path in query["paths"]])
query = util.batch_device(query, device="cpu")

with torch.no_grad():
    predictions = fws_model(support, query)

Project details


Release history Release notifications | RSS feed

This version

1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

plixkws-1.0.tar.gz (9.2 kB view details)

Uploaded Source

Built Distribution

plixkws-1.0-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file plixkws-1.0.tar.gz.

File metadata

  • Download URL: plixkws-1.0.tar.gz
  • Upload date:
  • Size: 9.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for plixkws-1.0.tar.gz
Algorithm Hash digest
SHA256 8b3bab8e344b5a15b99503cd1cde9cb4eebff551fa0cb4d8dcb82b23447411d6
MD5 adbe9c4b89ec5984ac5103cc1c3af1ab
BLAKE2b-256 281d1d87e80f8a36c3f13217af3e2ec5e58df88709885c6f586cc34dcb0c17b9

See more details on using hashes here.

File details

Details for the file plixkws-1.0-py3-none-any.whl.

File metadata

  • Download URL: plixkws-1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for plixkws-1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 958f42ca5975627cc955f52e39d96d58a2d2b53de83a77b7fd45640cc5d65738
MD5 07b90ec0588cb678ec3228ed9b7c49d3
BLAKE2b-256 28b5dac50b18f4c7b9d65bbd8bc0f463ed1cf5a486b830d22e0b7bae19db3f87

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page