Skip to main content

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

Project description

SpeechMix

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together.

Introduction

For the same input:

from datasets import load_dataset
import soundfile as sf


# define function to read in sound file
def map_to_array(batch):
    speech, _ = sf.read(batch["file"])
    batch["speech"] = speech
    return batch


# load dummy dataset and read soundfiles
ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
ds = ds.map(map_to_array)

transcript = ds['text'][0]
speech = ds["speech"][0]

Speech encoder NLP decoder

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP decoder only fine-tune on cross attention/projection/decoder embedding

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large", ftl=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on layer norm and attention

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", lna=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on speech encoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", fne=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Installation

pip install

pip install speechmix

Build from source

git clone and cd into this project.

pip install -e .

Example

usage python train.py --speech_model_config voidful/wav2vec2-large-xlsr-53-tw-gpt --nlp_model_config voidful/bart-base-chinese --SpeechMixEED --fne

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speechmix-0.0.10.tar.gz (7.3 kB view details)

Uploaded Source

Built Distributions

speechmix-0.0.10-py3.7.egg (7.1 kB view details)

Uploaded Source

speechmix-0.0.10-py3-none-any.whl (3.7 kB view details)

Uploaded Python 3

File details

Details for the file speechmix-0.0.10.tar.gz.

File metadata

  • Download URL: speechmix-0.0.10.tar.gz
  • Upload date:
  • Size: 7.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.10.tar.gz
Algorithm Hash digest
SHA256 5a900e426d85e67ee4cafb71f595dc030cab425ea432971895f45297ede6e7d3
MD5 5f33e18917202ddeaaffe9f6e3747c5c
BLAKE2b-256 f0077b0ba66f1de2be697674efce0d9b4eadc3683a3175178cab48854779e755

See more details on using hashes here.

File details

Details for the file speechmix-0.0.10-py3.7.egg.

File metadata

  • Download URL: speechmix-0.0.10-py3.7.egg
  • Upload date:
  • Size: 7.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.10-py3.7.egg
Algorithm Hash digest
SHA256 125d1c38f3ce0e7bba25814a9a6ebb17bf2a2152a3eb564f523b96c43e240441
MD5 647f222240ec0ed3c3fcc48eb82e29ef
BLAKE2b-256 87bc532fa28aaebb2799c580eb2d8fe479ec96c4de5e8339804d7a8e8fd73b7a

See more details on using hashes here.

File details

Details for the file speechmix-0.0.10-py3-none-any.whl.

File metadata

  • Download URL: speechmix-0.0.10-py3-none-any.whl
  • Upload date:
  • Size: 3.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 1d36d9707035015efeebdaa85d29d8382251682ac5a2c40f6a6b6ec103a34efb
MD5 b26952c1911b2197388e06e7f0fe10ed
BLAKE2b-256 53b3916aa70072bf258f2d769450d6d0dded61baa27590369874eb1aa3d02112

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page