malaya-speech

Speech-Toolkit for bahasa Malaysia, powered by Deep Learning Tensorflow.

These details have not been verified by PyPI

Project links

Reason this release was yanked:

bugs on STT

Project description

Malaya-Speech is a Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow.

Documentation

Proper documentation is available at https://malaya-speech.readthedocs.io/

Installing from the PyPI

CPU version

$ pip install malaya-speech

GPU version

$ pip install malaya-speech-gpu

Only Python 3.6.0 and above and Tensorflow 1.15.0 and above are supported.

We recommend to use virtualenv for development. All examples tested on Tensorflow version 1.15.4 and 2.4.1.

Features

Age Detection, detect age in speech using Finetuned Speaker Vector Malaya-Speech models.
Speaker Diarization, diarizing speakers using Pretrained Speaker Vector Malaya-Speech models.
Emotion Detection, detect emotions in speech using Finetuned Speaker Vector Malaya-Speech models.
Gender Detection, detect genders in speech using Finetuned Speaker Vector Malaya-Speech models.
Language Detection, detect hyperlocal languages in speech using Finetuned Speaker Vector Malaya-Speech models.
Noise Reduction, reduce multilevel noises using Pretrained STFT UNET Malaya-Speech models.
Speaker Change, detect changing speakers using Finetuned Speaker Vector Malaya-Speech models.
Speaker overlap, detect overlap speakers using Finetuned Speaker Vector Malaya-Speech models.
Speaker Vector, calculate similarity between speakers using Pretrained Malaya-Speech models.
Speech Enhancement, enhance voice activities using Pretrained Waveform UNET Malaya-Speech models.
Speech-to-Text, End-to-End Speech to Text using RNN-Transducer Malaya-Speech models.
Super Resolution, Super Resolution 4x using Pretrained Super Resolution Malaya-Speech models.
Text-to-Speech, using Pretrained Tacotron2 and FastSpeech2 Malaya-Speech models.
Vocoder, convert Mel to Waveform using Pretrained MelGAN, Multiband MelGAN and Universal MelGAN Vocoder Malaya-Speech models.
Voice Activity Detection, detect voice activities using Finetuned Speaker Vector Malaya-Speech models.
Voice Conversion, Many-to-One, One-to-Many, Many-to-Many, and Zero-shot Voice Conversion.
Hybrid 8-bit Quantization, provide hybrid 8-bit quantization for all models to reduce inference time up to 2x and model size up to 4x.

Pretrained Models

Malaya-Speech also released pretrained models, simply check at malaya-speech/pretrained-model

Wave UNET, Multi-Scale Neural Network for End-to-End Audio Source Separation, https://arxiv.org/abs/1806.03185
Wave ResNet UNET, added ResNet style into Wave UNET, no paper produced.
Deep Speaker, An End-to-End Neural Speaker Embedding System, https://arxiv.org/pdf/1705.02304.pdf
SpeakerNet, 1D Depth-wise Separable Convolutional Network for Text-Independent Speaker Recognition and Verification, https://arxiv.org/abs/2010.12653
VGGVox, a large-scale speaker identification dataset, https://arxiv.org/pdf/1706.08612.pdf
GhostVLAD, Utterance-level Aggregation For Speaker Recognition In The Wild, https://arxiv.org/abs/1902.10107
Conformer, Convolution-augmented Transformer for Speech Recognition, https://arxiv.org/abs/2005.08100
ALConformer, A lite Conformer, no paper produced.
Jasper, An End-to-End Convolutional Neural Acoustic Model, https://arxiv.org/abs/1904.03288
Tacotron2, Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions, https://arxiv.org/abs/1712.05884
FastSpeech2, Fast and High-Quality End-to-End Text to Speech, https://arxiv.org/abs/2006.04558
MelGAN, Generative Adversarial Networks for Conditional Waveform Synthesis, https://arxiv.org/abs/1910.06711
Multi-band MelGAN, Faster Waveform Generation for High-Quality Text-to-Speech, https://arxiv.org/abs/2005.05106
SRGAN, Modified version of SRGAN to do 1D Convolution, Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network, https://arxiv.org/abs/1609.04802
Speech Enhancement UNET, https://github.com/haoxiangsnr/Wave-U-Net-for-Speech-Enhancement
Universal MelGAN, Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains, https://arxiv.org/abs/2011.09631
FastVC, Faster and Accurate Voice Conversion using Transformer, no paper produced.

References

If you use our software for research, please cite:

@misc{Malaya, Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow,
  author = {Husein, Zolkepli},
  title = {Malaya-Speech},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/huseinzol05/malaya-speech}}
}

Acknowledgement

Thanks to Mesolitica and KeyReply for sponsoring GCP and private cloud to train Malaya models.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.4.0rc2 pre-release

Jan 19, 2024

1.4.0rc1 pre-release

Mar 25, 2023

1.3.0.2

Sep 22, 2022

1.3.0.1 yanked

Sep 22, 2022

Reason this release was yanked:

import error

1.3.0 yanked

Sep 18, 2022

Reason this release was yanked:

import bug

1.2.7

Jun 13, 2022

1.2.6.1

May 7, 2022

1.2.6 yanked

May 6, 2022

Reason this release was yanked:

bugs

1.2.5

Mar 20, 2022

1.2.4

Mar 1, 2022

1.2.3

Jan 6, 2022

1.2.2

Dec 23, 2021

1.2.1

Dec 2, 2021

1.2.0.1

Nov 10, 2021

1.2

Oct 2, 2021

1.1.3

Jul 16, 2021

1.1.2

Jul 11, 2021

1.1.1.1

Jul 3, 2021

1.1.1

Jun 29, 2021

1.1

Jun 1, 2021

1.0.3

May 28, 2021

1.0.2

May 8, 2021

1.0.1

Apr 24, 2021

1.0

Apr 18, 2021

0.0.1.8

Apr 4, 2021

This version

0.0.1.7 yanked

Apr 4, 2021

Reason this release was yanked:

bugs on STT

0.0.1.6

Jan 31, 2021

0.0.1.5.1

Jan 27, 2021

0.0.1.5

Jan 21, 2021

0.0.1.4.1

Jan 14, 2021

0.0.1.4

Dec 30, 2020

0.0.1.3.1

Dec 10, 2020

0.0.1.3

Dec 8, 2020

0.0.1.2

Nov 27, 2020

0.0.1.1

Nov 6, 2020

0.0.1

Oct 30, 2020

0.0.0.2

Oct 27, 2020

0.0.0.1

Sep 27, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

malaya_speech-0.0.1.7-py3-none-any.whl (370.8 kB view hashes)

Uploaded Apr 4, 2021 Python 3

Hashes for malaya_speech-0.0.1.7-py3-none-any.whl

Hashes for malaya_speech-0.0.1.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e31b6ca6154fef739ea8e14105984b6a08cbc4419b55cfa024bf39d1cc9909d4`
MD5	`c2aa7d1f31bcc197a49baffe34d291b5`
BLAKE2b-256	`147c8c894aaa7e500bc3c6c6bfa95c49d747f8e29350eb7a111622eb21f89087`