Skip to main content

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

Project description

SpeechMix

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together.

Introduction

For the same input:

from datasets import load_dataset
import soundfile as sf


# define function to read in sound file
def map_to_array(batch):
    speech, _ = sf.read(batch["file"])
    batch["speech"] = speech
    return batch


# load dummy dataset and read soundfiles
ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
ds = ds.map(map_to_array)

transcript = ds['text'][0]
speech = ds["speech"][0]

Speech encoder NLP decoder

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP decoder only fine-tune on cross attention/projection/decoder embedding

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large", ftl=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on layer norm and attention

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", lna=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on speech encoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", fne=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Installation

pip install

pip install speechmix

Build from source

git clone and cd into this project.

pip install -e .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speechmix-0.0.5.tar.gz (5.1 kB view details)

Uploaded Source

Built Distributions

speechmix-0.0.5-py3.7.egg (5.6 kB view details)

Uploaded Source

speechmix-0.0.5-py3-none-any.whl (3.3 kB view details)

Uploaded Python 3

File details

Details for the file speechmix-0.0.5.tar.gz.

File metadata

  • Download URL: speechmix-0.0.5.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.5.tar.gz
Algorithm Hash digest
SHA256 978c64e55311bdf1458415d6cdc1b52ae6ca93e7af08ff893dfa1874b3aba931
MD5 95f8e8271739c300299257ab862da3c0
BLAKE2b-256 c42bd54778c1dfc23300601c41c212f9003b5ae0555f2a05db782800818e11e7

See more details on using hashes here.

File details

Details for the file speechmix-0.0.5-py3.7.egg.

File metadata

  • Download URL: speechmix-0.0.5-py3.7.egg
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.5-py3.7.egg
Algorithm Hash digest
SHA256 21b9219fe83fe0db1901b56f46b7d826caa9303b92bcb85b5a65154d4781b838
MD5 09fdcbe08cac6d2d02e5bc7e0fa83108
BLAKE2b-256 bb829e995d9d7025376842146957f90eeedceb85328c07d6379e473ab34ea55b

See more details on using hashes here.

File details

Details for the file speechmix-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: speechmix-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 3.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 3b974669f8cae585f75196a0b064ba385cef383e9f9eed34b5df58eda75bfbf0
MD5 49b49fabefa92c9f31ad826c2a97d9ec
BLAKE2b-256 9e7a0a5255b73516054c6f2333172c4a215a22dae61c1dc7828628842be3afc2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page