Skip to main content

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

Project description

SpeechMix

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together.

Introduction

For the same input:

from datasets import load_dataset
import soundfile as sf


# define function to read in sound file
def map_to_array(batch):
    speech, _ = sf.read(batch["file"])
    batch["speech"] = speech
    return batch


# load dummy dataset and read soundfiles
ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
ds = ds.map(map_to_array)

transcript = ds['text'][0]
speech = ds["speech"][0]

Speech encoder NLP decoder

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP decoder only fine-tune on cross attention/projection/decoder embedding

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large", ftl=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on layer norm and attention

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", lna=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on speech encoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", fne=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Installation

pip install

pip install speechmix

Build from source

git clone and cd into this project.

pip install -e .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speechmix-0.0.6.tar.gz (5.1 kB view details)

Uploaded Source

Built Distributions

speechmix-0.0.6-py3.7.egg (5.6 kB view details)

Uploaded Source

speechmix-0.0.6-py3-none-any.whl (3.3 kB view details)

Uploaded Python 3

File details

Details for the file speechmix-0.0.6.tar.gz.

File metadata

  • Download URL: speechmix-0.0.6.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.6.tar.gz
Algorithm Hash digest
SHA256 371bb6244f5053bc93f58683c4c6f60cf689b5c8ebce66a76e2495bff92f7d9f
MD5 84bb44fafb08bf3189ce0b452bc9b4f9
BLAKE2b-256 54be3acdc28bcae9c07c7be702f63265af822e8a819428bd2a88a93de81cb37f

See more details on using hashes here.

File details

Details for the file speechmix-0.0.6-py3.7.egg.

File metadata

  • Download URL: speechmix-0.0.6-py3.7.egg
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.6-py3.7.egg
Algorithm Hash digest
SHA256 8bdf0096dafe4373d3b9dc879a4e306001aba456f03181abfb3b3d4fc974168c
MD5 15ad68f6247837cf9c8ca15a4ddb6d19
BLAKE2b-256 3f201b5b54edee1f8d8d6ed011471b860942285f74f356aa700636454d6e59d2

See more details on using hashes here.

File details

Details for the file speechmix-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: speechmix-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 3.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 e814eb28fc6fcb839d202ebed17c4aee91cc085537b0911d2f3694c186bf0a85
MD5 b79b6a7c834b713543769d14f8805aff
BLAKE2b-256 4d04ff5ec289cbea5e18d3f46aeca3ed125ec6ddb3fd948a2f730f48fde1aa26

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page