Skip to main content

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

Project description

SpeechMix

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together.

Introduction

For the same input:

from datasets import load_dataset
import soundfile as sf


# define function to read in sound file
def map_to_array(batch):
    speech, _ = sf.read(batch["file"])
    batch["speech"] = speech
    return batch


# load dummy dataset and read soundfiles
ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
ds = ds.map(map_to_array)

transcript = ds['text'][0]
speech = ds["speech"][0]

Speech encoder NLP decoder

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP decoder only fine-tune on cross attention/projection/decoder embedding

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large", ftl=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on layer norm and attention

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", lna=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on speech encoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", fne=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Installation

pip install

pip install speechmix

Build from source

git clone and cd into this project.

pip install -e .

Example

usage:
python train.py --speech_model_config facebook/wav2vec2-large-robust-ft-libri-960h --nlp_model_config facebook/mbart-large-50-one-to-many-mmt --SpeechMixEED --lna --dataset librispeech_asr --field clean --train_split train.100 --test_split validation --batch 3 --grad_accum 8

python train.py --speech_model_config facebook/wav2vec2-large-robust-ft-libri-960h --nlp_model_config facebook/mbart-large-50-one-to-many-mmt --SpeechMixEED --fne --dataset librispeech_asr --field clean --train_split train.100 --test_split validation --batch 3 --grad_accum 8

python train.py --speech_model_config facebook/wav2vec2-large-robust-ft-libri-960h --nlp_model_config facebook/mbart-large-50-one-to-many-mmt --SpeechMixED --dataset librispeech_asr --field other --train_split train.500 --test_split validation --batch 3 --grad_accum 8

python train.py --speech_model_config facebook/wav2vec2-large-robust-ft-libri-960h --nlp_model_config facebook/mbart-large-50-one-to-many-mmt --SpeechMixED --ftl --dataset librispeech_asr --field other --train_split train.500 --test_split validation --batch 3 --grad_accum 8

python train.py --speech_model_config facebook/wav2vec2-large-robust-ft-libri-960h --nlp_model_config facebook/mbart-large-50-one-to-many-mmt --SpeechMixSelf --dataset common_voice --field en --train_split train --test_split test --batch 3 --grad_accum 8

python train.py --speech_model_config facebook/wav2vec2-large-robust-ft-libri-960h --nlp_model_config facebook/mbart-large-50-one-to-many-mmt --SpeechMixEED --lna --dataset patrickvonplaten/librispeech_asr_dummy --field clean --train_split validation --test_split test --batch 3 --grad_accum 4

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speechmix-0.0.18.tar.gz (8.3 kB view details)

Uploaded Source

Built Distributions

speechmix-0.0.18-py3.7.egg (7.9 kB view details)

Uploaded Source

speechmix-0.0.18-py3-none-any.whl (4.3 kB view details)

Uploaded Python 3

File details

Details for the file speechmix-0.0.18.tar.gz.

File metadata

  • Download URL: speechmix-0.0.18.tar.gz
  • Upload date:
  • Size: 8.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.18.tar.gz
Algorithm Hash digest
SHA256 1160294bbcb34380b3e6c1835d42a56904f2ce17d587404f065df7bb693c7b9e
MD5 3512cb48cf5ac6d42de8eca94b02b8b9
BLAKE2b-256 f6a8f70dfad59887c1eedba8c632fc0000b60a7e73e2fbcadf3ae1090d123cda

See more details on using hashes here.

File details

Details for the file speechmix-0.0.18-py3.7.egg.

File metadata

  • Download URL: speechmix-0.0.18-py3.7.egg
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.18-py3.7.egg
Algorithm Hash digest
SHA256 e7862e04289d8742730f9b885c1e1125d27b5ecc899f1b2aafc9ba2842d12125
MD5 2de3d27c764f9b0df1b2f46ece1b9051
BLAKE2b-256 7e516e3927eb715a1537b9106673f91d7495fd3be72e235dad2159a4b2eb4c11

See more details on using hashes here.

File details

Details for the file speechmix-0.0.18-py3-none-any.whl.

File metadata

  • Download URL: speechmix-0.0.18-py3-none-any.whl
  • Upload date:
  • Size: 4.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.18-py3-none-any.whl
Algorithm Hash digest
SHA256 e60d505b7f0f07958aee201cc44c5a86e16e71d356629e757ccd5c5fd43ed8b2
MD5 c6af6e493a2aae41ea6a5fe6c8cff791
BLAKE2b-256 9d7c1d3e4630b950303e7450562c1de181f55e550d8268f5489e7eda5343f23b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page