Skip to main content

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

Project description

SpeechMix

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together.

Introduction

For the same input:

from datasets import load_dataset
import soundfile as sf


# define function to read in sound file
def map_to_array(batch):
    speech, _ = sf.read(batch["file"])
    batch["speech"] = speech
    return batch


# load dummy dataset and read soundfiles
ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
ds = ds.map(map_to_array)

transcript = ds['text'][0]
speech = ds["speech"][0]

Speech encoder NLP decoder

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP decoder only fine-tune on cross attention/projection/decoder embedding

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large", ftl=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on layer norm and attention

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", lna=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on speech encoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", fne=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Installation

pip install

pip install speechmix

Build from source

git clone and cd into this project.

pip install -e .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speechmix-0.0.3.tar.gz (4.9 kB view details)

Uploaded Source

Built Distributions

speechmix-0.0.3-py3.7.egg (5.2 kB view details)

Uploaded Source

speechmix-0.0.3-py3-none-any.whl (3.1 kB view details)

Uploaded Python 3

File details

Details for the file speechmix-0.0.3.tar.gz.

File metadata

  • Download URL: speechmix-0.0.3.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.3.tar.gz
Algorithm Hash digest
SHA256 c92be83147db7a735a3680d5a6d47625011ee4c15fefce1fbe4033a6253cc6be
MD5 71f4bca01b8af86365858158bc4b2fb9
BLAKE2b-256 aee69330f2c84271467c14483a51bfaa33ce61f5c4b32eb96927a2785205c1a4

See more details on using hashes here.

File details

Details for the file speechmix-0.0.3-py3.7.egg.

File metadata

  • Download URL: speechmix-0.0.3-py3.7.egg
  • Upload date:
  • Size: 5.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.3-py3.7.egg
Algorithm Hash digest
SHA256 804a4b100a3798f1d37f2b50fa5399313509d7ed4533d264caacdba77064efea
MD5 8d9eda57fde0523d21c75904207e70b3
BLAKE2b-256 df5ed34800d0d1bb072b707ca92efb5db8e36182c91264077a3864b5b0486d65

See more details on using hashes here.

File details

Details for the file speechmix-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: speechmix-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 3.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 8e026eb5df85271679354aa41ebd109802b0917e86bc2bdb6da7d89c7eb1bbd1
MD5 94e96c391c4e2ce4eba6c516104e20bb
BLAKE2b-256 ff7a2f2a911bd176ce5caeefb6c2acd682682b7008c8e58d7fa02a5dcc0389a5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page