Skip to main content

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

Project description

SpeechMix

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together.

Introduction

For the same input:

from datasets import load_dataset
import soundfile as sf


# define function to read in sound file
def map_to_array(batch):
    speech, _ = sf.read(batch["file"])
    batch["speech"] = speech
    return batch


# load dummy dataset and read soundfiles
ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
ds = ds.map(map_to_array)

transcript = ds['text'][0]
speech = ds["speech"][0]

Speech encoder NLP decoder

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP decoder only fine-tune on cross attention/projection/decoder embedding

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large", ftl=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on layer norm and attention

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", lna=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on speech encoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", fne=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Installation

pip install

pip install speechmix

Build from source

git clone and cd into this project.

pip install -e .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speechmix-0.0.7.tar.gz (5.1 kB view details)

Uploaded Source

Built Distributions

speechmix-0.0.7-py3.7.egg (5.7 kB view details)

Uploaded Source

speechmix-0.0.7-py3-none-any.whl (3.3 kB view details)

Uploaded Python 3

File details

Details for the file speechmix-0.0.7.tar.gz.

File metadata

  • Download URL: speechmix-0.0.7.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.7.tar.gz
Algorithm Hash digest
SHA256 813835ae83a0cde14c6e746c056a958abac6a95050e46eb3924186a322b80225
MD5 1ef9069fbc302200697bc00e3b20aa67
BLAKE2b-256 f27f54922456b8c9bca1a983a3b030fcae432d518def264d85a004e8005cf591

See more details on using hashes here.

File details

Details for the file speechmix-0.0.7-py3.7.egg.

File metadata

  • Download URL: speechmix-0.0.7-py3.7.egg
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.7-py3.7.egg
Algorithm Hash digest
SHA256 60dd5777dcbbe0158fd77aa7990a1469bd7e67ba29dbcc22e6360a09f555ead7
MD5 6593d747c2e117ff117c3d49e282d790
BLAKE2b-256 9a527e6166288fc00b4ba569e415f0e886036ac6748342bcdde8603652df89ce

See more details on using hashes here.

File details

Details for the file speechmix-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: speechmix-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 3.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 cca8f3107a32026d28cdb17c45ba2a45488a8cd195cfe6e112445cd867ae6ae3
MD5 e581a2fc13ae29548dc50461f8aefda4
BLAKE2b-256 1d2857e8e920843f03126053a9fdf8d81ed22cc68317f2aa2645a95d8f2fee11

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page