Skip to main content

Self-Supervised Speech Pre-training and Representation Learning Toolkit

Project description



Apache License 2.0 CC_BY_NC License Build Codecov Bitbucket open issues

Announcement

If you wish to submit to SUPERB benchmark while you find superbbenchmark.org is down temporarily, please try to use 140.112.21.28 as an alternative. They share the same backend. We will make the official domain work as soon as possible.

What's New

Introduction and Usages

This is an open source toolkit called s3prl, which stands for Self-Supervised Speech Pre-training and Representation Learning. Self-supervised speech pre-trained models are called upstream in this toolkit, and are utilized in various downstream tasks.

The toolkit has three major usages:

Pretrain

  • Pretrain upstream models, including Mockingjay, Audio ALBERT and TERA.
  • Document: pretrain/README.md

Upstream

  • Easily load most of the existing upstream models with pretrained weights in a unified I/O interface.
  • Pretrained models are registered through torch.hub, which means you can use these models in your own project by one-line plug-and-play without depending on this toolkit's coding style.
  • Document: upstream/README.md

Downstream

Below is an intuitive illustration on how this toolkit may help you:



Feel free to use or modify our toolkit in your research. Here is a list of papers using our toolkit. Any question, bug report or improvement suggestion is welcome through opening up a new issue.

If you find this toolkit helpful to your research, please do consider citing our papers, thanks!

Installation

  1. Python >= 3.6
  2. Install sox on your OS
  3. Install s3prl
pip install -e ./
  1. Some upstream models require special dependencies. If you encounter error with a specific upstream model, you can look into the README.md under each upstream folder. E.g., upstream/pase/README.md

Development pattern for contributors

  1. Create a personal fork of the main S3PRL repository in GitHub.
  2. Make your changes in a named branch different from master, e.g. you create a branch new-awesome-feature.
  3. Contact us if you have any questions during development.
  4. Generate a pull request through the Web interface of GitHub.
  5. Please verify that your code is free of basic mistakes, we appreciate any contribution!

Reference Repositories

License

The majority of S3PRL Toolkit is licensed under the Apache License version 2.0, however all the files authored by Facebook, Inc. (which have explicit copyright statement on the top) are licensed under CC-BY-NC.

Used by

List of papers that used our toolkit (Feel free to add your own paper by making a pull request)

Self-Supervised Pretraining

Explanability

Adversarial Attack

Voice Conversion

Benchmark and Evaluation

  • SUPERB: Speech processing Universal PERformance Benchmark (Yang et al., 2021)

    @misc{superb,
          title={SUPERB: Speech processing Universal PERformance Benchmark}, 
          author={Shu-wen Yang and Po-Han Chi and Yung-Sung Chuang and Cheng-I Jeff Lai and Kushal Lakhotia and Yist Y. Lin and Andy T. Liu and Jiatong Shi and Xuankai Chang and Guan-Ting Lin and Tzu-Hsien Huang and Wei-Cheng Tseng and Ko-tik Lee and Da-Rong Liu and Zili Huang and Shuyan Dong and Shang-Wen Li and Shinji Watanabe and Abdelrahman Mohamed and Hung-yi Lee},
          year={2021},
          eprint={2105.01051},
          archivePrefix={arXiv},
          primaryClass={cs.CL}
    }
    
  • Utilizing Self-supervised Representations for MOS Prediction (Tseng et al., 2021)

    @misc{ssr_mos,
        title={Utilizing Self-supervised Representations for MOS Prediction}, 
        author={Wei-Cheng Tseng and Chien-yu Huang and Wei-Tsung Kao and Yist Y. Lin and Hung-yi Lee},
        year={2021},
        eprint={2104.03017},
        archivePrefix={arXiv},
        primaryClass={eess.AS}
    }
    

}

Citation

If you find this toolkit useful, please consider citing following papers.

  • If you use our pre-training scripts, or the downstream tasks considered in TERA and Mockingjay, please consider citing the following:
@misc{tera,
  title={TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech},
  author={Andy T. Liu and Shang-Wen Li and Hung-yi Lee},
  year={2020},
  eprint={2007.06028},
  archivePrefix={arXiv},
  primaryClass={eess.AS}
}
@article{mockingjay,
   title={Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders},
   ISBN={9781509066315},
   url={http://dx.doi.org/10.1109/ICASSP40776.2020.9054458},
   DOI={10.1109/icassp40776.2020.9054458},
   journal={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
   publisher={IEEE},
   author={Liu, Andy T. and Yang, Shu-wen and Chi, Po-Han and Hsu, Po-chun and Lee, Hung-yi},
   year={2020},
   month={May}
}
  • If you use our organized upstream interface and features, or the SUPERB downstream benchmark, please consider citing the following:
@inproceedings{yang21c_interspeech,
  author={Shu-wen Yang and Po-Han Chi and Yung-Sung Chuang and Cheng-I Jeff Lai and Kushal Lakhotia and Yist Y. Lin and Andy T. Liu and Jiatong Shi and Xuankai Chang and Guan-Ting Lin and Tzu-Hsien Huang and Wei-Cheng Tseng and Ko-tik Lee and Da-Rong Liu and Zili Huang and Shuyan Dong and Shang-Wen Li and Shinji Watanabe and Abdelrahman Mohamed and Hung-yi Lee},
  title={{SUPERB: Speech Processing Universal PERformance Benchmark}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={1194--1198},
  doi={10.21437/Interspeech.2021-1775}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

s3prl-0.3.4.tar.gz (291.7 kB view details)

Uploaded Source

File details

Details for the file s3prl-0.3.4.tar.gz.

File metadata

  • Download URL: s3prl-0.3.4.tar.gz
  • Upload date:
  • Size: 291.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.12

File hashes

Hashes for s3prl-0.3.4.tar.gz
Algorithm Hash digest
SHA256 006b86e99fd5c3bc5e265a343ab98b1610a628db92129a068913e3b369099f42
MD5 348d116eebf05063d9efc77530cec92c
BLAKE2b-256 d6dd9b71715e8543f0c656831f6499e9dad38d31425273fe6e6e388109b3a8dd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page