Skip to main content

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

Project description

SpeechMix

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together.

Introduction

For the same input:

from datasets import load_dataset
import soundfile as sf


# define function to read in sound file
def map_to_array(batch):
    speech, _ = sf.read(batch["file"])
    batch["speech"] = speech
    return batch


# load dummy dataset and read soundfiles
ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
ds = ds.map(map_to_array)

transcript = ds['text'][0]
speech = ds["speech"][0]

Speech encoder NLP decoder

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP decoder only fine-tune on cross attention/projection/decoder embedding

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large", ftl=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on layer norm and attention

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", lna=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on speech encoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", fne=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Installation

pip install

pip install speechmix

Build from source

git clone and cd into this project.

pip install -e .

Example

usage:
python train.py --speech_model_config voidful/wav2vec2-large-xlsr-53-tw-gpt --nlp_model_config voidful/bart-base-chinese --SpeechMixEED --lna --dataset common_voice --field zh-TW --train_split train --test_split test --batch 6 --grad_accum 4

python train.py --speech_model_config voidful/wav2vec2-large-xlsr-53-tw-gpt --nlp_model_config voidful/bart-base-chinese --SpeechMixEED --fne --dataset common_voice --field zh-TW --train_split train --test_split test --batch 6 --grad_accum 4

python train.py --speech_model_config patrickvonplaten/unispeech-large-1500h-cv-timit --nlp_model_config facebook/mbart-large-50-one-to-many-mmt --SpeechMixEED --fne --dataset librispeech_asr --field other --train_split train.500 --test_split validation --batch 3 --grad_accum 8

python train.py --speech_model_config patrickvonplaten/unispeech-large-1500h-cv-timit --nlp_model_config facebook/mbart-large-50-one-to-many-mmt --SpeechMixSelf --dataset librispeech_asr --field other --train_split train.500 --test_split validation --batch 6 --grad_accum 4

python train.py --speech_model_config facebook/wav2vec2-base-960h --nlp_model_config facebook/bart-base --SpeechMixEED --lna --dataset patrickvonplaten/librispeech_asr_dummy --field clean --train_split validation --test_split test --batch 3 --grad_accum 4

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speechmix-0.0.17.tar.gz (8.2 kB view details)

Uploaded Source

Built Distributions

speechmix-0.0.17-py3.7.egg (8.0 kB view details)

Uploaded Source

speechmix-0.0.17-py3-none-any.whl (4.4 kB view details)

Uploaded Python 3

File details

Details for the file speechmix-0.0.17.tar.gz.

File metadata

  • Download URL: speechmix-0.0.17.tar.gz
  • Upload date:
  • Size: 8.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.17.tar.gz
Algorithm Hash digest
SHA256 8605ce6c97b86c09ca6813f38c60a240286ecc37d5540e9c9ee9c48ff9aa131e
MD5 2526e0c1ffc2593f1cd6eb4888c099a8
BLAKE2b-256 58ebf7ef370ddf27c774deb06441c0571549f896db4cb3416c9e5a6d508b6aae

See more details on using hashes here.

File details

Details for the file speechmix-0.0.17-py3.7.egg.

File metadata

  • Download URL: speechmix-0.0.17-py3.7.egg
  • Upload date:
  • Size: 8.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.17-py3.7.egg
Algorithm Hash digest
SHA256 16751ac5d7b4d998082f7f601c11c2fdfdd6fa85576c9b834b3eb27e9ce92fdd
MD5 e192390a88b28a60f81d527405ae89e7
BLAKE2b-256 19ac574b4e4919f0b3dc9fec62f423d22a6401d178a7780bc321c05501f99962

See more details on using hashes here.

File details

Details for the file speechmix-0.0.17-py3-none-any.whl.

File metadata

  • Download URL: speechmix-0.0.17-py3-none-any.whl
  • Upload date:
  • Size: 4.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.17-py3-none-any.whl
Algorithm Hash digest
SHA256 08f43899fd0a32b1caba91c6f7c9b89b049be4913eae6941b70e3751ded24650
MD5 ae981045503b8699865c3b07152f894d
BLAKE2b-256 ff96bcc8613f3816f80b402e848242327629ed10ea18da86bf9677764731d818

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page