Skip to main content

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

Project description

SpeechMix

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together.

Introduction

For the same input:

from datasets import load_dataset
import soundfile as sf


# define function to read in sound file
def map_to_array(batch):
    speech, _ = sf.read(batch["file"])
    batch["speech"] = speech
    return batch


# load dummy dataset and read soundfiles
ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
ds = ds.map(map_to_array)

transcript = ds['text'][0]
speech = ds["speech"][0]

Speech encoder NLP decoder

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP decoder only fine-tune on cross attention/projection/decoder embedding

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large", ftl=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on layer norm and attention

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", lna=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on speech encoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", fne=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Installation

pip install

pip install speechmix

Build from source

git clone and cd into this project.

pip install -e .

Example

usage:
python train.py --speech_model_config facebook/wav2vec2-large-robust-ft-libri-960h --nlp_model_config facebook/mbart-large-50-one-to-many-mmt --SpeechMixEED --lna --dataset librispeech_asr --field clean --train_split train.100 --test_split validation --batch 3 --grad_accum 8

python train.py --speech_model_config facebook/wav2vec2-large-robust-ft-libri-960h --nlp_model_config facebook/mbart-large-50-one-to-many-mmt --SpeechMixEED --fne --dataset librispeech_asr --field clean --train_split train.100 --test_split validation --batch 3 --grad_accum 8

python train.py --speech_model_config facebook/wav2vec2-large-robust-ft-libri-960h --nlp_model_config facebook/mbart-large-50-one-to-many-mmt --SpeechMixED --dataset librispeech_asr --field other --train_split train.500 --test_split validation --batch 3 --grad_accum 8

python train.py --speech_model_config facebook/wav2vec2-large-robust-ft-libri-960h --nlp_model_config facebook/mbart-large-50-one-to-many-mmt --SpeechMixED --ftl --dataset librispeech_asr --field other --train_split train.500 --test_split validation --batch 3 --grad_accum 8

python train.py --speech_model_config facebook/wav2vec2-large-robust-ft-libri-960h --nlp_model_config facebook/mbart-large-50-one-to-many-mmt --SpeechMixSelf --dataset librispeech_asr --field clean --train_split train.100 --test_split validation --batch 3 --grad_accum 10

python train.py --speech_model_config facebook/wav2vec2-large-robust-ft-libri-960h --nlp_model_config facebook/mbart-large-50-one-to-many-mmt --SpeechMixGAN --dataset librispeech_asr --field clean --train_split train.100 --test_split validation --batch 3 --grad_accum 10

python train.py --speech_model_config facebook/wav2vec2-large-robust-ft-libri-960h --nlp_model_config facebook/mbart-large-50-one-to-many-mmt --SpeechMixSelf --dataset common_voice --field en --train_split train --test_split test --batch 5 --grad_accum 8

python train.py --speech_model_config facebook/wav2vec2-large-robust-ft-libri-960h --nlp_model_config facebook/mbart-large-50-one-to-many-mmt --SpeechMixEED --lna --dataset patrickvonplaten/librispeech_asr_dummy --field clean --train_split validation --test_split test --batch 3 --grad_accum 4

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speechmix-0.0.24.tar.gz (9.6 kB view details)

Uploaded Source

Built Distributions

speechmix-0.0.24-py3.7.egg (9.3 kB view details)

Uploaded Source

speechmix-0.0.24-py3-none-any.whl (4.9 kB view details)

Uploaded Python 3

File details

Details for the file speechmix-0.0.24.tar.gz.

File metadata

  • Download URL: speechmix-0.0.24.tar.gz
  • Upload date:
  • Size: 9.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.24.tar.gz
Algorithm Hash digest
SHA256 9846483b3c3a2a65112c6cafe829af252f0cd13e53ce87d044f491c11b63f5b3
MD5 13e05ca5eab4007e8b0015e990e1681c
BLAKE2b-256 de27446ddc6a5bef0c00ac9a88de3359dee5d77e8ecf90b94e43c4420c4da9d7

See more details on using hashes here.

File details

Details for the file speechmix-0.0.24-py3.7.egg.

File metadata

  • Download URL: speechmix-0.0.24-py3.7.egg
  • Upload date:
  • Size: 9.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.24-py3.7.egg
Algorithm Hash digest
SHA256 2eb4b40cb2987dd6a980b9c15109d67ade866145f6b3455324fc5f56b5a6e172
MD5 13714136072509488834149d4e4b7162
BLAKE2b-256 ac4d945e0418983b24b0870c97b0833a4d981f0ad118402765cb56b6372625f6

See more details on using hashes here.

File details

Details for the file speechmix-0.0.24-py3-none-any.whl.

File metadata

  • Download URL: speechmix-0.0.24-py3-none-any.whl
  • Upload date:
  • Size: 4.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.24-py3-none-any.whl
Algorithm Hash digest
SHA256 86a09298c46a18d6fcc2cdd262f581d6b1ec7916398ea18f01e8a99d00fc350c
MD5 a0be5c5e03137bf002de6f063ada7dc6
BLAKE2b-256 e2f044764955fda1d04ccd141b73dd92577ea8d2ebb9b813aca39110c4e5dc0a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page