Skip to main content

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

Project description

SpeechMix

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together.

Introduction

For the same input:

from datasets import load_dataset
import soundfile as sf


# define function to read in sound file
def map_to_array(batch):
    speech, _ = sf.read(batch["file"])
    batch["speech"] = speech
    return batch


# load dummy dataset and read soundfiles
ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
ds = ds.map(map_to_array)

transcript = ds['text'][0]
speech = ds["speech"][0]

Speech encoder NLP decoder

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP decoder only fine-tune on cross attention/projection/decoder embedding

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large", ftl=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on layer norm and attention

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", lna=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on speech encoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", fne=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Installation

pip install

pip install speechmix

Build from source

git clone and cd into this project.

pip install -e .

Example

usage python train.py --speech_model_config voidful/wav2vec2-large-xlsr-53-tw-gpt --nlp_model_config voidful/bart-base-chinese --SpeechMixEED --fne

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speechmix-0.0.9.tar.gz (7.3 kB view details)

Uploaded Source

Built Distributions

speechmix-0.0.9-py3.7.egg (7.0 kB view details)

Uploaded Source

speechmix-0.0.9-py3-none-any.whl (3.7 kB view details)

Uploaded Python 3

File details

Details for the file speechmix-0.0.9.tar.gz.

File metadata

  • Download URL: speechmix-0.0.9.tar.gz
  • Upload date:
  • Size: 7.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.9.tar.gz
Algorithm Hash digest
SHA256 34211b49b664042df7b82264fdbfb0cebe02e5acccf5f776b42cb1a9369fd10b
MD5 4a34659ef7fc3c8e9e99797529fbc568
BLAKE2b-256 2d4c8636007e1f665b13befd541bbf2cab35cebce4d7c912104763cd4d1a78fe

See more details on using hashes here.

File details

Details for the file speechmix-0.0.9-py3.7.egg.

File metadata

  • Download URL: speechmix-0.0.9-py3.7.egg
  • Upload date:
  • Size: 7.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.9-py3.7.egg
Algorithm Hash digest
SHA256 c3d131d10150c37d4257587b00c9645b1b482731ce3f6e8a47a61352bc85a2a2
MD5 4de3b5b6a87db01caea864b1dd21c4ef
BLAKE2b-256 57c4c223915dcea96fffecbd78d2ebd8cca820655bfac29b1c5175f401d27f95

See more details on using hashes here.

File details

Details for the file speechmix-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: speechmix-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 3.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.8

File hashes

Hashes for speechmix-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 fae8c6b4fbe0dc7ccc1ea962b685999865fbaca316773ecd2a64c8c7d1e7a4d5
MD5 020b93d4997e6749d2d5362e29e4e024
BLAKE2b-256 ce2674eaa07a1aced457a2505224b303f91dedd23450ec0550a3184c5c31c549

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page