Skip to main content

Speech-Toolkit for bahasa Malaysia, powered by Tensorflow and PyTorch.

Project description

Malaya-Speech is a Speech-Toolkit library for bahasa Malaysia, powered by Tensorflow and PyTorch.

Documentation

Stable released documentation is available at https://malaya-speech.readthedocs.io/

Installing from the PyPI

$ pip install malaya-speech

It will automatically install all dependencies except for Tensorflow and PyTorch. So you can choose your own Tensorflow CPU / GPU version and PyTorch CPU / GPU version.

Only Python >= 3.6.0, Tensorflow >= 1.15.0, and PyTorch >= 1.10 are supported.

Development Release

Install from master branch,

$ pip install git+https://github.com/huseinzol05/malaya-speech.git

We recommend to use virtualenv for development.

Documentation at https://malaya-speech.readthedocs.io/en/latest/

Features

  • Age Detection, detect age in speech using Finetuned Speaker Vector.

  • Speaker Diarization, diarizing speakers using Pretrained Speaker Vector.

  • Emotion Detection, detect emotions in speech using Finetuned Speaker Vector.

  • Force Alignment, generate a time-aligned transcription of an audio file using RNNT and CTC.

  • Gender Detection, detect genders in speech using Finetuned Speaker Vector.

  • Language Detection, detect hyperlocal languages in speech using Finetuned Speaker Vector.

  • Language Model, using KenLM, Masked language model using BERT and RoBERTa, and GPT2 to do ASR decoder scoring.

  • Multispeaker Separation, Multispeaker separation using FastSep on 8k Wav.

  • Noise Reduction, reduce multilevel noises using STFT UNET.

  • Speaker Change, detect changing speakers using Finetuned Speaker Vector.

  • Speaker overlap, detect overlap speakers using Finetuned Speaker Vector.

  • Speaker Vector, calculate similarity between speakers using Pretrained Speaker Vector.

  • Speech Enhancement, enhance voice activities using Waveform UNET.

  • SpeechSplit Conversion, detailed speaking style conversion by disentangling speech into content, timbre, rhythm and pitch using PyWorld and PySPTK.

  • Speech-to-Text, End-to-End Speech to Text for Malay, Mixed (Malay, Singlish and Mandarin) and Singlish using RNNT, Wav2Vec2, HuBERT and BEST-RQ CTC.

  • Super Resolution, Super Resolution 4x for Waveform using ResNet UNET and Neural Vocoder.

  • Text-to-Speech, Text to Speech for Malay and Singlish using Tacotron2, FastSpeech2, FastPitch, GlowTTS, LightSpeech and VITS.

  • Vocoder, convert Mel to Waveform using MelGAN, Multiband MelGAN and Universal MelGAN Vocoder.

  • Voice Activity Detection, detect voice activities using Finetuned Speaker Vector.

  • Voice Conversion, Many-to-One, One-to-Many, Many-to-Many, and Zero-shot Voice Conversion.

  • Hybrid 8-bit Quantization, provide hybrid 8-bit quantization for all models to reduce inference time up to 2x and model size up to 4x.

Pretrained Models

Malaya-Speech also released pretrained models, simply check at malaya-speech/pretrained-model

References

If you use our software for research, please cite:

@misc{Malaya, Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow,
  author = {Husein, Zolkepli},
  title = {Malaya-Speech},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/huseinzol05/malaya-speech}}
}

Acknowledgement

Thanks to KeyReply for private V100s cloud and Mesolitica for private RTXs cloud to train Malaya-Speech models.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

malaya_speech-1.3.0.2-py3-none-any.whl (1.1 MB view details)

Uploaded Python 3

File details

Details for the file malaya_speech-1.3.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for malaya_speech-1.3.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c66fd5b21a1ab94a3f73d8b920691511957e99e9f3191c9cfaa5511bdc00ffa4
MD5 6e0a20f472a0b5bad551c69ef0d6ec29
BLAKE2b-256 c7430a50cc833b9e76764543ff9eaf5a0fe791ab86fc86b5391fb11b6b8abce9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page