Skip to main content

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

Project description

SpeechMix

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together.

Introduction

For the same input:

from datasets import load_dataset
import soundfile as sf


# define function to read in sound file
def map_to_array(batch):
    speech, _ = sf.read(batch["file"])
    batch["speech"] = speech
    return batch


# load dummy dataset and read soundfiles
ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
ds = ds.map(map_to_array)

transcript = ds['text'][0]
speech = ds["speech"][0]

Speech encoder NLP decoder

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP decoder only fine-tune on cross attention/projection/decoder embedding

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large", ftl=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on layer norm and attention

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", lna=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on speech encoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", fne=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Installation

pip install

pip install speechmix

Build from source

git clone and cd into this project.

pip install -e .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speechmix-0.0.4.tar.gz (5.0 kB view details)

Uploaded Source

Built Distributions

speechmix-0.0.4-py3.7.egg (5.5 kB view details)

Uploaded Source

speechmix-0.0.4-py3-none-any.whl (3.3 kB view details)

Uploaded Python 3

File details

Details for the file speechmix-0.0.4.tar.gz.

File metadata

  • Download URL: speechmix-0.0.4.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.4.tar.gz
Algorithm Hash digest
SHA256 0ec4f8328d904689d12261c5b34228250a99879ee24004997070b11a7d19b386
MD5 6894ece578a7f20912455ae0354c4721
BLAKE2b-256 9bb2b3c7f3ee61999021d532ed7e2824e9493e93701840c71af938d37cb1c9c7

See more details on using hashes here.

File details

Details for the file speechmix-0.0.4-py3.7.egg.

File metadata

  • Download URL: speechmix-0.0.4-py3.7.egg
  • Upload date:
  • Size: 5.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.4-py3.7.egg
Algorithm Hash digest
SHA256 29df3bddade4075392883ecf1a3f3ffa7cf72f671e19eb4db07373bc7c97bb97
MD5 f05469e54b004deeae4dfc2659bdf082
BLAKE2b-256 d14380d6fde5b1f08ce754348bcbe7df20faa4c7d78e93dd6dce571c2531ad20

See more details on using hashes here.

File details

Details for the file speechmix-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: speechmix-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 3.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 d15acd6c6f10cf91980e0ffcfff436b2db7ae90dfc06bd9fc59bb5afc5511e30
MD5 ebb876e82dfeff98768d764c14c81404
BLAKE2b-256 a3940d2504a42c6b5015860ffe246b060beba59df2fd7a5fb2db6dac6b9f18e8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page