Skip to main content

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

Project description

SpeechMix

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together.

Introduction

For the same input:

from datasets import load_dataset
import soundfile as sf


# define function to read in sound file
def map_to_array(batch):
    speech, _ = sf.read(batch["file"])
    batch["speech"] = speech
    return batch


# load dummy dataset and read soundfiles
ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
ds = ds.map(map_to_array)

transcript = ds['text'][0]
speech = ds["speech"][0]

Speech encoder NLP decoder

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP decoder only fine-tune on cross attention/projection/decoder embedding

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large", ftl=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on layer norm and attention

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", lna=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on speech encoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", fne=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Installation

pip install

pip install speechmix

Build from source

git clone and cd into this project.

pip install -e .

Example

usage python train.py --speech_model_config voidful/wav2vec2-large-xlsr-53-tw-gpt --nlp_model_config voidful/bart-base-chinese --SpeechMixEED --fne

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speechmix-0.0.14.tar.gz (7.4 kB view details)

Uploaded Source

Built Distributions

speechmix-0.0.14-py3.7.egg (7.4 kB view details)

Uploaded Source

speechmix-0.0.14-py3-none-any.whl (3.8 kB view details)

Uploaded Python 3

File details

Details for the file speechmix-0.0.14.tar.gz.

File metadata

  • Download URL: speechmix-0.0.14.tar.gz
  • Upload date:
  • Size: 7.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.14.tar.gz
Algorithm Hash digest
SHA256 0fd738eaa919529a1c238729d842e58b79cea2a29f7c5d464a319200dc657f15
MD5 50c9771ce1c2c5c1be5f9e2d72e8ad5f
BLAKE2b-256 c42738035de06304ba761801c102141abb3e1182bdae4523adbdc119da19b49b

See more details on using hashes here.

File details

Details for the file speechmix-0.0.14-py3.7.egg.

File metadata

  • Download URL: speechmix-0.0.14-py3.7.egg
  • Upload date:
  • Size: 7.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.14-py3.7.egg
Algorithm Hash digest
SHA256 ca6b9f5b26656018460439a065e599f1cb7ac574ecdb836a37980324ade17454
MD5 ba56c830366649718d5861d4593e8fca
BLAKE2b-256 32cadd51c23c9742bf80506aa8e52b91049419e2c85ce3ebeb19c97d557d9070

See more details on using hashes here.

File details

Details for the file speechmix-0.0.14-py3-none-any.whl.

File metadata

  • Download URL: speechmix-0.0.14-py3-none-any.whl
  • Upload date:
  • Size: 3.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.14-py3-none-any.whl
Algorithm Hash digest
SHA256 a68d1f270ccde3b219e67ef5c2cfc60151f62ba25daed2bf531d6bc5942514f0
MD5 74ac785eead1b1f5d844631209186fa5
BLAKE2b-256 eb7bc9c15d797b664f2aa5960246331c382c186b3623bad6856d649a7b8cf8d8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page