Skip to main content

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

Project description

SpeechMix

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together.

Introduction

For the same input:

from datasets import load_dataset
import soundfile as sf


# define function to read in sound file
def map_to_array(batch):
    speech, _ = sf.read(batch["file"])
    batch["speech"] = speech
    return batch


# load dummy dataset and read soundfiles
ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
ds = ds.map(map_to_array)

transcript = ds['text'][0]
speech = ds["speech"][0]

Speech encoder NLP decoder

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP decoder only fine-tune on cross attention/projection/decoder embedding

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large", ftl=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on layer norm and attention

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", lna=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on speech encoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", fne=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Installation

pip install

pip install speechmix

Build from source

git clone and cd into this project.

pip install -e .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speechmix-0.0.1.tar.gz (4.9 kB view details)

Uploaded Source

Built Distributions

speechmix-0.0.1-py3.7.egg (5.1 kB view details)

Uploaded Source

speechmix-0.0.1-py3-none-any.whl (3.1 kB view details)

Uploaded Python 3

File details

Details for the file speechmix-0.0.1.tar.gz.

File metadata

  • Download URL: speechmix-0.0.1.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.1.tar.gz
Algorithm Hash digest
SHA256 dfa757ea00a69455eb58252a5c29cd2c4d3bab4a60a4aa250a02f259192722bd
MD5 6c6b06c2a636b38c9cda003c619d2212
BLAKE2b-256 fde508454b2f5c76450ec3fa555e080a81c408edd679fc27632c92c27cb763da

See more details on using hashes here.

File details

Details for the file speechmix-0.0.1-py3.7.egg.

File metadata

  • Download URL: speechmix-0.0.1-py3.7.egg
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.1-py3.7.egg
Algorithm Hash digest
SHA256 cc88297f3010d92dd209a064b666b67e0e56790525e70134fb0ac9af5d99e85e
MD5 7a011bc03a120b64e4e0d75cd1dc0aab
BLAKE2b-256 7d11e8879c33640c9a8f6112e68a1702b6a4fdf3d2f5c08e8b150226b820ac63

See more details on using hashes here.

File details

Details for the file speechmix-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: speechmix-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 3.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0f80f9ce1ca97ef20be325d8ade55bf2d52e2ce737d37d7777e25c436ae9762b
MD5 e30a12a4e3ee92313efed05eff6f7627
BLAKE2b-256 294398ffc725d724d8b5474509c729e72a2bf895538a85577e3fbe63b7bf53c1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page