Skip to main content

Speech to Phoneme, Bandwidth Extension and Speaker Verification using the Vibravox dataset.

Project description

Python PyTorch Lightning Config: hydra HuggingFace Datasets

Speech to Phoneme, Bandwidth Extension and Speaker Verification using the Vibravox dataset.

Resources:

  • 📝: The Open access paper published in Speech Communication related to this project is available on arXiv and Speech Communication
  • 🤗: The dataset used in this project is hosted by Hugging Face. You can access it here.
  • 🌐: For more information about the project, visit our project page.
  • 🏆: Explore Leaderboards on Papers With Code.

Setup

pip install vibravox

Available sensors

  • 🟣:headset_microphone ( Not available for Bandwidth Extension as it is the reference mic )
  • 🟡:throat_microphone
  • 🟢:forehead_accelerometer
  • 🔵:rigid_in_ear_microphone
  • 🔴:soft_in_ear_microphone
  • 🧊:temple_vibration_pickup

Run some models

  • EBEN for Bandwidth Extension

    • Train and test on speech_clean, for recordings in a quiet environment:
      python run.py \
        lightning_datamodule=bwe \
        lightning_datamodule.sensor=throat_microphone \
        lightning_module=eben \
        lightning_module.generator.p=2 \
        +callbacks=[bwe_checkpoint] \
        ++trainer.check_val_every_n_epoch=15 \
        ++trainer.max_epochs=500
      
    • Train on speech_clean mixed with speechless_noisy and test on speech_noisy, for recordings in a noisy environment: (weights initialized from vibravox_EBEN_models )
      python run.py \
        lightning_datamodule=noisybwe \
        lightning_datamodule.sensor=throat_microphone \
        lightning_module=eben \
        lightning_module.description=from_pretrained-throat_microphone \
        ++lightning_module.generator=dummy \
        ++lightning_module.generator._target_=vibravox.torch_modules.dnn.eben_generator.EBENGenerator.from_pretrained \
        ++lightning_module.generator.pretrained_model_name_or_path=Cnam-LMSSC/EBEN_throat_microphone \
        ++lightning_module.discriminator=dummy \
        ++lightning_module.discriminator._target_=vibravox.torch_modules.dnn.eben_discriminator.DiscriminatorEBENMultiScales.from_pretrained \
        ++lightning_module.discriminator.pretrained_model_name_or_path=Cnam-LMSSC/DiscriminatorEBENMultiScales_throat_microphone \
        +callbacks=[bwe_checkpoint] \
        ++callbacks.checkpoint.monitor=validation/torchmetrics_stoi/synthetic \
        ++trainer.check_val_every_n_epoch=15 \
        ++trainer.max_epochs=200
      
  • wav2vec2 for Speech to Phoneme

    python run.py \
      lightning_datamodule=stp \
      lightning_datamodule.sensor=throat_microphone \
      lightning_module=wav2vec2_for_stp \
      lightning_module.optimizer.lr=1e-5 \
      ++trainer.max_epochs=10
    
    • Train and test on speech_noisy, for recordings in a noisy environment: (weights initialized from vibravox_phonemizers )
    python run.py \
    lightning_datamodule=stp \
    lightning_datamodule.sensor=throat_microphone \
    lightning_datamodule.subset=speech_noisy \
    lightning_datamodule/data_augmentation=aggressive \
    lightning_module=wav2vec2_for_stp \
    lightning_module.wav2vec2_for_ctc.pretrained_model_name_or_path=Cnam-LMSSC/phonemizer_throat_microphone \
    lightning_module.optimizer.lr=1e-6 \
    ++trainer.max_epochs=30
    
  • ECAPA2 for Speaker Verification:

    • Test the model on speech_clean:
    python run.py \
      lightning_datamodule=spkv \
      lightning_module=ecapa2 \
      logging=csv \
      ++trainer.limit_train_batches=0 \
      ++trainer.limit_val_batches=0
    
    • Test on speech_clean mixed with speechless_noisy, representative of speech_noisy with the exact same pairs that were used on speech_clean, allowing direct comparison of results:
    python run.py \
      lightning_datamodule=spkv \
      lightning_datamodule.dataset_name=Cnam-LMSSC/vibravox_mixed_for_spkv \
      lightning_datamodule.subset=speech_noisy_mixed \
      lightning_module=ecapa2 \
      logging=csv \
      ++trainer.limit_train_batches=0 \
      ++trainer.limit_val_batches=0
    

Cite our work

If you use code in this repository or the Vibravox dataset (either curated or non-curated versions) for research, please cite this paper :

@article{hauret2025vibravox,
      title={{Vibravox: A dataset of french speech captured with body-conduction audio sensors}},
      author={{Hauret, Julien and Olivier, Malo and Joubaud, Thomas and Langrenne, Christophe and
        Poir{\'e}e, Sarah and Zimpfer, V{\'e}ronique and Bavu, {\'E}ric},
      journal={Speech Communication},
      pages={103238},
      year={2025},
      publisher={Elsevier}
}

and this HuggingFace repository, which is linked to a DOI :

@misc{cnamlmssc2024vibravoxdataset,
    author={Hauret, Julien and Olivier, Malo and Langrenne, Christophe and
        Poir{\'e}e, Sarah and Bavu, {\'E}ric},
    title        = { {Vibravox} (Revision 7990b7d) },
    year         = 2024,
    url          = { https://huggingface.co/datasets/Cnam-LMSSC/vibravox },
    doi          = { 10.57967/hf/2727 },
    publisher    = { Hugging Face }
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vibravox-0.1.0.tar.gz (38.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vibravox-0.1.0-py3-none-any.whl (52.6 kB view details)

Uploaded Python 3

File details

Details for the file vibravox-0.1.0.tar.gz.

File metadata

  • Download URL: vibravox-0.1.0.tar.gz
  • Upload date:
  • Size: 38.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for vibravox-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6eaf727dc90307be847f7e0c256dd4c5ec8b2109824958acc8490006becce559
MD5 b9976e2ebb98c6b18e3d26274fe1b6e1
BLAKE2b-256 d13a25e78d51cad974e99249abeba0b288f2b5ca0384c124284cfffca5a78a1d

See more details on using hashes here.

File details

Details for the file vibravox-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: vibravox-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 52.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for vibravox-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 48fe346ef5b2ee37da26a4d8bd86054347afb535816a209dbe453e3887e59195
MD5 19e77539e03027f06601292a3558e754
BLAKE2b-256 cdac4d390e11e5aad4b6640cce8bdf4bc7fa26543cc4a73fa96f3ee5f79c8445

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page