vibravox

Speech to Phoneme, Bandwidth Extension and Speaker Verification using the Vibravox dataset.

Project description

Speech to Phoneme, Bandwidth Extension and Speaker Verification using the Vibravox dataset.

Resources:

📝: The Open access paper published in Speech Communication related to this project is available on arXiv and Speech Communication
🤗: The dataset used in this project is hosted by Hugging Face. You can access it here.
🌐: For more information about the project, visit our project page.
🏆: Explore Leaderboards on Papers With Code.

Setup

pip install vibravox

Available sensors

🟣:headset_microphone ( Not available for Bandwidth Extension as it is the reference mic )
🟡:throat_microphone
🟢:forehead_accelerometer
🔵:rigid_in_ear_microphone
🔴:soft_in_ear_microphone
🧊:temple_vibration_pickup

Run some models

EBEN for Bandwidth Extension

Train and test on speech_clean, for recordings in a quiet environment:

python run.py \
  lightning_datamodule=bwe \
  lightning_datamodule.sensor=throat_microphone \
  lightning_module=eben \
  lightning_module.generator.p=2 \
  +callbacks=[bwe_checkpoint] \
  ++trainer.check_val_every_n_epoch=15 \
  ++trainer.max_epochs=500

Train on speech_clean mixed with speechless_noisy and test on speech_noisy, for recordings in a noisy environment: (weights initialized from vibravox_EBEN_models )

python run.py \
  lightning_datamodule=noisybwe \
  lightning_datamodule.sensor=throat_microphone \
  lightning_module=eben \
  lightning_module.description=from_pretrained-throat_microphone \
  ++lightning_module.generator=dummy \
  ++lightning_module.generator._target_=vibravox.torch_modules.dnn.eben_generator.EBENGenerator.from_pretrained \
  ++lightning_module.generator.pretrained_model_name_or_path=Cnam-LMSSC/EBEN_throat_microphone \
  ++lightning_module.discriminator=dummy \
  ++lightning_module.discriminator._target_=vibravox.torch_modules.dnn.eben_discriminator.DiscriminatorEBENMultiScales.from_pretrained \
  ++lightning_module.discriminator.pretrained_model_name_or_path=Cnam-LMSSC/DiscriminatorEBENMultiScales_throat_microphone \
  +callbacks=[bwe_checkpoint] \
  ++callbacks.checkpoint.monitor=validation/torchmetrics_stoi/synthetic \
  ++trainer.check_val_every_n_epoch=15 \
  ++trainer.max_epochs=200

wav2vec2 for Speech to Phoneme

Train and test on speech_clean, for recordings in a quiet environment: (weights initialized from facebook/wav2vec2-base-fr-voxpopuli )

python run.py \
  lightning_datamodule=stp \
  lightning_datamodule.sensor=throat_microphone \
  lightning_module=wav2vec2_for_stp \
  lightning_module.optimizer.lr=1e-5 \
  ++trainer.max_epochs=10

Train and test on speech_noisy, for recordings in a noisy environment: (weights initialized from vibravox_phonemizers )

python run.py \
lightning_datamodule=stp \
lightning_datamodule.sensor=throat_microphone \
lightning_datamodule.subset=speech_noisy \
lightning_datamodule/data_augmentation=aggressive \
lightning_module=wav2vec2_for_stp \
lightning_module.wav2vec2_for_ctc.pretrained_model_name_or_path=Cnam-LMSSC/phonemizer_throat_microphone \
lightning_module.optimizer.lr=1e-6 \
++trainer.max_epochs=30

ECAPA2 for Speaker Verification:

Test the model on speech_clean:

python run.py \
  lightning_datamodule=spkv \
  lightning_module=ecapa2 \
  logging=csv \
  ++trainer.limit_train_batches=0 \
  ++trainer.limit_val_batches=0

Test on speech_clean mixed with speechless_noisy, representative of speech_noisy with the exact same pairs that were used on speech_clean, allowing direct comparison of results:

python run.py \
  lightning_datamodule=spkv \
  lightning_datamodule.dataset_name=Cnam-LMSSC/vibravox_mixed_for_spkv \
  lightning_datamodule.subset=speech_noisy_mixed \
  lightning_module=ecapa2 \
  logging=csv \
  ++trainer.limit_train_batches=0 \
  ++trainer.limit_val_batches=0

Cite our work

If you use code in this repository or the Vibravox dataset (either curated or non-curated versions) for research, please cite this paper :

@article{hauret2025vibravox,
      title={{Vibravox: A dataset of french speech captured with body-conduction audio sensors}},
      author={{Hauret, Julien and Olivier, Malo and Joubaud, Thomas and Langrenne, Christophe and
        Poir{\'e}e, Sarah and Zimpfer, V{\'e}ronique and Bavu, {\'E}ric},
      journal={Speech Communication},
      pages={103238},
      year={2025},
      publisher={Elsevier}
}

and this HuggingFace repository, which is linked to a DOI :

@misc{cnamlmssc2024vibravoxdataset,
    author={Hauret, Julien and Olivier, Malo and Langrenne, Christophe and
        Poir{\'e}e, Sarah and Bavu, {\'E}ric},
    title        = { {Vibravox} (Revision 7990b7d) },
    year         = 2024,
    url          = { https://huggingface.co/datasets/Cnam-LMSSC/vibravox },
    doi          = { 10.57967/hf/2727 },
    publisher    = { Hugging Face }
}

Project details

Release history Release notifications | RSS feed

This version

0.1.1

Jun 11, 2025

0.1.0

Jun 11, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vibravox-0.1.1.tar.gz (38.9 kB view details)

Uploaded Jun 11, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vibravox-0.1.1-py3-none-any.whl (52.6 kB view details)

Uploaded Jun 11, 2025 Python 3

File details

Details for the file vibravox-0.1.1.tar.gz.

File metadata

Download URL: vibravox-0.1.1.tar.gz
Upload date: Jun 11, 2025
Size: 38.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for vibravox-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`4d9e694eaed0173c2cafaee1bb5cd487fd15edbf35c0d9a133a6a1db7be49150`
MD5	`62e21aa9b8c6e0cf31b5567093dad980`
BLAKE2b-256	`18ac73e019c65dee309172261e3c4b456a2b69139332bb6bee31ec7d04070090`

See more details on using hashes here.

File details

Details for the file vibravox-0.1.1-py3-none-any.whl.

File metadata

Download URL: vibravox-0.1.1-py3-none-any.whl
Upload date: Jun 11, 2025
Size: 52.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for vibravox-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dfad3758f009c410c3665bb6997c49430119f26e456885e031547a544bcc4c8b`
MD5	`6485ce7e96a32c1d62950cb025e2d2fe`
BLAKE2b-256	`7a7aa2d77b11bfd42fafa9af9784de99f5b35cacf59190c321f7856dcbdec151`

See more details on using hashes here.

vibravox 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Resources:

Setup

Available sensors

Run some models

Cite our work

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes