A simple Python package to easily use Meta's Massively Multilingual Speech (MMS) project
Project description
EasyMMS
A simple Python package to easily use Meta's Massively Multilingual Speech (MMS) project.
Installation
-
You will need ffmpeg for audio processing
-
Also, if you want to use the
Alignment
model:
- you will need
perl
to use uroman. Check the perl website for installation instructions on different platforms. - You will need a nightly version of
torchaudio
:
pip install --pre torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118
- You might need sox as well.
- Install
easymms
from Pypi
pip install easymms
Quickstart
ASR
You will need first to download the model weights, you can find and download all the supported models from here.
from easymms.models.asr import ASRModel
asr = ASRModel(model='/path/to/mms/model')
files = ['path/to/media_file_1', 'path/to/media_file_2']
transcriptions = asr.transcribe(files, lang='eng', align=False)
for i, transcription in enumerate(transcriptions):
print(f">>> file {files[i]}")
print(transcription)
ASR with Alignment
from easymms.models.asr import ASRModel
asr = ASRModel(model='/path/to/mms/model')
files = ['path/to/media_file_1', 'path/to/media_file_2']
transcriptions = asr.transcribe(files, lang='eng', align=True)
for i, transcription in enumerate(transcriptions):
print(f">>> file {files[i]}")
for segment in transcription:
print(f"{segment['start_time']} -> {segment['end_time']}: {segment['text']}")
print("----")
Alignment model only
from easymms.models.alignment import AlignmentModel
align_model = AlignmentModel()
transcriptions = align_model.align('path/to/wav_file.wav',
transcript=["segment 1", "segment 2"],
lang='eng')
for transcription in transcriptions:
for segment in transcription:
print(f"{segment['start_time']} -> {segment['end_time']}: {segment['text']}")
TTS
Coming Soon
LID
Coming Soon
API reference
You can check the API reference documentation for more details.
License
Since the models are released under the CC-BY-NC 4.0 license. This project is following the same License.
Disclaimer & Credits
This project is not endorsed or certified by Meta AI and is just simplifying the use of the MMS project.
All credit goes to the authors and to Meta for open sourcing the models.
Please check their paper Scaling Speech Technology to 1000+ languages and their blog post.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.