Skip to main content

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

Project description

SpeechMix

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together.

Introduction

For the same input:

from datasets import load_dataset
import soundfile as sf


# define function to read in sound file
def map_to_array(batch):
    speech, _ = sf.read(batch["file"])
    batch["speech"] = speech
    return batch


# load dummy dataset and read soundfiles
ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
ds = ds.map(map_to_array)

transcript = ds['text'][0]
speech = ds["speech"][0]

Speech encoder NLP decoder

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP decoder only fine-tune on cross attention/projection/decoder embedding

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large", ftl=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on layer norm and attention

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", lna=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on speech encoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", fne=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Installation

pip install

pip install speechmix

Build from source

git clone and cd into this project.

pip install -e .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speechmix-0.0.2.tar.gz (4.9 kB view details)

Uploaded Source

Built Distributions

speechmix-0.0.2-py3.7.egg (5.2 kB view details)

Uploaded Source

speechmix-0.0.2-py3-none-any.whl (3.1 kB view details)

Uploaded Python 3

File details

Details for the file speechmix-0.0.2.tar.gz.

File metadata

  • Download URL: speechmix-0.0.2.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.2.tar.gz
Algorithm Hash digest
SHA256 1ed37e9a2284e86c81419bb68c4ea83e822286bc0e78ca3a14d03bb9b695937c
MD5 3b41cdb2dc6e4426a004121eba0810e2
BLAKE2b-256 09a020608a4813d67d32693242e373d70ec08e192515fbd224385b5323ae1a4f

See more details on using hashes here.

File details

Details for the file speechmix-0.0.2-py3.7.egg.

File metadata

  • Download URL: speechmix-0.0.2-py3.7.egg
  • Upload date:
  • Size: 5.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.2-py3.7.egg
Algorithm Hash digest
SHA256 24af5ab6aa251957d774b6fde87818bd824b906e905a50e5f76911eeef9270d9
MD5 594d30cf61e563120eb92ef9a5f08649
BLAKE2b-256 694c2c832d81721a551836ad75ced28d37b7d816a1fcf30dfaeafc2a7a857ae8

See more details on using hashes here.

File details

Details for the file speechmix-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: speechmix-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 3.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b39d391ad2ecd064d6b281fb67f23792bd13d8f478eff5226202c2c3ede5faaa
MD5 4f9b6ce055d71298863ca4bc49a1f616
BLAKE2b-256 777009ae3ff5a267334ca2dcc709972ffacac2dc07cfe008288fb069afe26730

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page