Skip to main content

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

Project description

SpeechMix

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together.

Introduction

For the same input:

from datasets import load_dataset
import soundfile as sf


# define function to read in sound file
def map_to_array(batch):
    speech, _ = sf.read(batch["file"])
    batch["speech"] = speech
    return batch


# load dummy dataset and read soundfiles
ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
ds = ds.map(map_to_array)

transcript = ds['text'][0]
speech = ds["speech"][0]

Speech encoder NLP decoder

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP decoder only fine-tune on cross attention/projection/decoder embedding

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large", ftl=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on layer norm and attention

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", lna=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on speech encoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", fne=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Installation

pip install

pip install speechmix

Build from source

git clone and cd into this project.

pip install -e .

Example

usage:
python train.py --speech_model_config voidful/wav2vec2-large-xlsr-53-tw-gpt --nlp_model_config facebook/mbart-large-50-one-to-many-mmt --SpeechMixEED --lna --dataset common_voice --field zh-TW --train_split train --test_split test --batch 6 --grad_accum 4

python train.py --speech_model_config voidful/wav2vec2-large-xlsr-53-tw-gpt --nlp_model_config facebook/mbart-large-50-one-to-many-mmt --SpeechMixEED --fne --dataset common_voice --field zh-TW --train_split train --test_split test --batch 6 --grad_accum 4

python train.py --speech_model_config patrickvonplaten/unispeech-large-1500h-cv-timit --nlp_model_config facebook/mbart-large-50-one-to-many-mmt --SpeechMixSelf --dataset librispeech_asr --field other --train_split train.500 --test_split valid --batch 6 --grad_accum 4

python train.py --speech_model_config facebook/wav2vec2-base-960h --nlp_model_config facebook/bart-base --SpeechMixEED --lna --dataset patrickvonplaten/librispeech_asr_dummy --field clean --train_split validation --test_split test --batch 3 --grad_accum 4

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speechmix-0.0.16.tar.gz (7.7 kB view details)

Uploaded Source

Built Distributions

speechmix-0.0.16-py3.7.egg (7.0 kB view details)

Uploaded Source

speechmix-0.0.16-py3-none-any.whl (4.0 kB view details)

Uploaded Python 3

File details

Details for the file speechmix-0.0.16.tar.gz.

File metadata

  • Download URL: speechmix-0.0.16.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.16.tar.gz
Algorithm Hash digest
SHA256 dc4174bbaefde73a034c097a4947d8180a638a5fbd405f4dccef323ef11f301a
MD5 3f51787015f1c3a0e52379fc736e4495
BLAKE2b-256 c6240feb5485b1ce983f7f31b6eaadc078efa1f08a5acb5cf2ddcc57351bf832

See more details on using hashes here.

File details

Details for the file speechmix-0.0.16-py3.7.egg.

File metadata

  • Download URL: speechmix-0.0.16-py3.7.egg
  • Upload date:
  • Size: 7.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.16-py3.7.egg
Algorithm Hash digest
SHA256 01855faaed84fef3fefb46b6024bbb7758d4a8327ac468b24bab52861c796f64
MD5 18f1585e3c363e95af209b97dfe02ec4
BLAKE2b-256 16a0c3253755c6ab222c58d6feea8e92ae407f5f998fdb496b9f8b78efe00b78

See more details on using hashes here.

File details

Details for the file speechmix-0.0.16-py3-none-any.whl.

File metadata

  • Download URL: speechmix-0.0.16-py3-none-any.whl
  • Upload date:
  • Size: 4.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.16-py3-none-any.whl
Algorithm Hash digest
SHA256 72760b2dbd7755b4695d34cf1a165976a68238579c0d843362d2466e81d3c543
MD5 93df47eec6abc0d3063a83248fced979
BLAKE2b-256 8c38e19071ed011a1f7b36c5f3e3c3fbb96c01867c8a2efb33748b51174f2004

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page