Skip to main content

ESPnet Model Zoo

Project description

ESPnet Model Zoo

PyPI version Python Versions Downloads GitHub license Unitest Model test codecov Code style: black

Utilities managing the pretrained models created by ESPnet. This function is inspired by the Asteroid pretrained model function.

Install

pip install torch
pip install espnet_model_zoo

Python API for inference

model_name in the following section should be huggingface_id or one of the tags in the table.csv. Or you can directly provide zenodo URL (e.g., https://zenodo.org/record/xxxxxxx/files/hogehoge.zip?download=1).

ASR

import soundfile
from espnet2.bin.asr_inference import Speech2Text
speech2text = Speech2Text.from_pretrained(
    "model_name",
    # Decoding parameters are not included in the model file
    maxlenratio=0.0,
    minlenratio=0.0,
    beam_size=20,
    ctc_weight=0.3,
    lm_weight=0.5,
    penalty=0.0,
    nbest=1
)
# Confirm the sampling rate is equal to that of the training corpus.
# If not, you need to resample the audio data before inputting to speech2text
speech, rate = soundfile.read("speech.wav")
nbests = speech2text(speech)

text, *_ = nbests[0]
print(text)

TTS

import soundfile
from espnet2.bin.tts_inference import Text2Speech
text2speech = Text2Speech.from_pretrained("model_name")
speech = text2speech("foobar")["wav"]
soundfile.write("out.wav", speech.numpy(), text2speech.fs, "PCM_16")

Speech separation

import soundfile
from espnet2.bin.enh_inference import SeparateSpeech
separate_speech = SeparateSpeech.from_pretrained(
    "model_name",
    # for segment-wise process on long speech
    segment_size=2.4,
    hop_size=0.8,
    normalize_segment_scale=False,
    show_progressbar=True,
    ref_channel=None,
    normalize_output_wav=True,
)
# Confirm the sampling rate is equal to that of the training corpus.
# If not, you need to resample the audio data before inputting to speech2text
speech, rate = soundfile.read("long_speech.wav")
waves = separate_speech(speech[None, ...], fs=rate)

This API allows processing both short audio samples and long audio samples. For long audio samples, you can set the value of arguments segment_size, hop_size (optionally normalize_segment_scale and show_progressbar) to perform segment-wise speech enhancement/separation on the input speech. Note that the segment-wise processing is disabled by default.

For old ESPnet (<=10.1)

ASR

import soundfile
from espnet_model_zoo.downloader import ModelDownloader
from espnet2.bin.asr_inference import Speech2Text
d = ModelDownloader()
speech2text = Speech2Text(
    **d.download_and_unpack("model_name"),
    # Decoding parameters are not included in the model file
    maxlenratio=0.0,
    minlenratio=0.0,
    beam_size=20,
    ctc_weight=0.3,
    lm_weight=0.5,
    penalty=0.0,
    nbest=1
)

TTS

import soundfile
from espnet_model_zoo.downloader import ModelDownloader
from espnet2.bin.tts_inference import Text2Speech
d = ModelDownloader()
text2speech = Text2Speech(**d.download_and_unpack("model_name"))

Speech separation

import soundfile
from espnet_model_zoo.downloader import ModelDownloader
from espnet2.bin.enh_inference import SeparateSpeech
d = ModelDownloader()
separate_speech = SeparateSpeech(
    **d.download_and_unpack("model_name"),
    # for segment-wise process on long speech
    segment_size=2.4,
    hop_size=0.8,
    normalize_segment_scale=False,
    show_progressbar=True,
    ref_channel=None,
    normalize_output_wav=True,
)

Instruction for ModelDownloader

from espnet_model_zoo.downloader import ModelDownloader
d = ModelDownloader("~/.cache/espnet")  # Specify cachedir
d = ModelDownloader()  # <module_dir> is used as cachedir by default

To obtain a model, you need to give a huggingface_idmodel` or a tag , which is listed in table.csv.

>>> d.download_and_unpack("kamo-naoyuki/mini_an4_asr_train_raw_bpe_valid.acc.best")
{"asr_train_config": <config path>, "asr_model_file": <model path>, ...}

You can specify the revision if it's huggingface_id giving with @:

>>> d.download_and_unpack("kamo-naoyuki/mini_an4_asr_train_raw_bpe_valid.acc.best@<revision>")
{"asr_train_config": <config path>, "asr_model_file": <model path>, ...}

Note that if the model already exists, you can skip downloading and unpacking.

You can also get a model with certain conditions.

d.download_and_unpack(task="asr", corpus="wsj")

If multiple models are found with the condition, the last model is selected. You can also specify the condition using "version" option.

d.download_and_unpack(task="asr", corpus="wsj", version=-1)  # Get the last model
d.download_and_unpack(task="asr", corpus="wsj", version=-2)  # Get previous model

You can also obtain it from the URL directly.

d.download_and_unpack("https://zenodo.org/record/...")

If you need to use a local model file using this API, you can also give it.

d.download_and_unpack("./some/where/model.zip")

In this case, the contents are also expanded in the cache directory, but the model is identified by the file path, so if you move the model to somewhere and unpack again, it's treated as another model, thus the contents are expanded again at another place.

Query model names

You can view the model names from our Zenodo community, https://zenodo.org/communities/espnet/, or using query(). All information are written in table.csv.

d.query("name")

You can also show them with specifying certain conditions.

d.query("name", task="asr")

Command line tools

  • espnet_model_zoo_query

    # Query model name
    espnet_model_zoo_query task=asr corpus=wsj
    # Show all model name
    espnet_model_zoo_query
    # Query the other key
    espnet_model_zoo_query --key url task=asr corpus=wsj
    
  • espnet_model_zoo_download

    espnet_model_zoo_download <model_name>  # Print the path of the downloaded file
    espnet_model_zoo_download --unpack true <model_name>   # Print the path of unpacked files
    
  • espnet_model_zoo_upload

    export ACCESS_TOKEN=<access_token>
    espnet_zenodo_upload \
        --file <packed_model> \
        --title <title> \
        --description <description> \
        --creator_name <your-git-account>
    

Use pretrained model in ESPnet recipe

# e.g. ASR WSJ task
git clone https://github.com/espnet/espnet
pip install -e .
cd egs2/wsj/asr1
./run.sh --skip_data_prep false --skip_train true --download_model kamo-naoyuki/wsj

Register your model

Huggingface

  1. Upload your model using huggingface API

    Coming soon...

  2. Create a Pull Request to modify table.csv

    The models registered in this table.csv, the model are tested in the CI. Indeed, the model can be downloaded without modification table.csv.

  3. (Administrator does) Increment the third version number of setup.py, e.g. 0.0.3 -> 0.0.4

  4. (Administrator does) Release new version

Zenodo (Obsolete)

  1. Upload your model to Zenodo

    You need to signup to Zenodo and create an access token to upload models. You can upload your own model by using espnet_model_zoo_upload command freely, but we normally upload a model using recipes.

  2. Create a Pull Request to modify table.csv

    You need to append your record at the last line.

  3. (Administrator does) Increment the third version number of setup.py, e.g. 0.0.3 -> 0.0.4

  4. (Administrator does) Release new version

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

espnet_model_zoo-0.1.7.tar.gz (23.1 kB view details)

Uploaded Source

Built Distribution

espnet_model_zoo-0.1.7-py3-none-any.whl (19.6 kB view details)

Uploaded Python 3

File details

Details for the file espnet_model_zoo-0.1.7.tar.gz.

File metadata

  • Download URL: espnet_model_zoo-0.1.7.tar.gz
  • Upload date:
  • Size: 23.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for espnet_model_zoo-0.1.7.tar.gz
Algorithm Hash digest
SHA256 61d88a1898d7d6bfebeb51100f194fa7fc9b68f959913255ee5ccf68090465b0
MD5 309dc1b492c40677c637158452dbbaa9
BLAKE2b-256 d8883b49dca3f981380746ea5c6e766360aac42b28466c9fc9a29096669d9e45

See more details on using hashes here.

File details

Details for the file espnet_model_zoo-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: espnet_model_zoo-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 19.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for espnet_model_zoo-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 8a228c44566f931e3113ec28c41f1e342be8b8897bad2fa99a6d51263e033743
MD5 a7a6a6d43e408d350f9377930b060911
BLAKE2b-256 190eb3340d59ded1ece54cca41d88f0be529598b43b3ec05b608dc03bd154a1a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page