mstt-A command-line tool and Python library for running speech-to-text against multiple models.
Project description
Modern Speech-to-Text (MSTT)
Modern Speech-to-Text (MSTT) is a Python library designed to provide a unified and extensible interface for various Speech-to-Text (STT) models. It aims to simplify the process of integrating different STT services and models into your applications, offering a consistent API regardless of the underlying STT engine.
Features
- Unified API: Interact with multiple STT models through a single, consistent interface.
- Extensible Design: Easily add support for new STT models and services.
- Local and Cloud Support: Seamlessly switch between local models and cloud-based STT APIs.
- Plugin System: Integrate custom STT models or enhance existing functionalities via a flexible plugin system.
Installation
To install MSTT, you can use pip:
pip install mstt
If you want to install with specific STT model backends, you can specify them as extras. For example, to install with funasr support:
pip install mstt[funasr]
Usage
Basic Transcription
Here's a basic example of how to use MSTT to transcribe an audio file:
from mstt import MSTT
# Initialize MSTT with a specific model (e.g., 'funasr')
# Ensure the 'mstt-funasr' package is installed if you use 'funasr'
mstt = MSTT(model_id="funasr")
# Transcribe an audio file
audio_file_path = "path/to/your/audio.wav"
result = mstt.transcribe(audio_file_path)
print(f"Transcription: {result.text}")
print(f"Segments: {result.segments}")
Available Models
MSTT supports various models through its plugin system. You can list available models:
from mstt import MSTT
available_models = MSTT.list_models()
print("Available STT Models:")
for model_id, description in available_models.items():
print(f"- {model_id}: {description}")
Command Line Interface (CLI)
MSTT also provides a command-line interface for quick transcriptions:
mstt transcribe --model funasr --audio path/to/your/audio.wav
Run mstt --help for more CLI options.
Creating Custom Plugins
MSTT is designed to be extensible through a plugin system. You can create your own STT model plugins and register them with MSTT.
Plugin Structure
A plugin typically consists of:
- A Model Implementation: A Python class that inherits from
mstt.models.STTModeland implements thetranscribemethod. - A Registration Module: A Python module that registers your model with MSTT using the
mstt.register_modeldecorator.
Example: A Simple Custom Plugin
Let's say you want to create a plugin for a hypothetical MyCustomSTT model. You would create a Python package (e.g., mstt_mycustom):
mstt_mycustom/
├── pyproject.toml
├── src/
│ └── mstt_mycustom/
│ ├── __init__.py
│ ├── models.py
│ └── register.py
src/mstt_mycustom/models.py:
from mstt.models import STTModel, TranscriptionResult
class MyCustomSTTModel(STTModel):
def __init__(self, model_id: str, device: str = "cpu"):
super().__init__(model_id, device)
# Initialize your custom model here
print(f"Initializing MyCustomSTTModel with ID: {model_id} on device: {device}")
def transcribe(self, audio_file_path: str) -> TranscriptionResult:
# Implement your transcription logic here
# This is a placeholder for demonstration
print(f"Transcribing {audio_file_path} using MyCustomSTTModel")
dummy_text = "This is a custom transcription result."
dummy_segments = [
{"start": 0.0, "end": 2.0, "text": "This is a custom"},
{"start": 2.1, "end": 4.0, "text": "transcription result."}
]
return TranscriptionResult(text=dummy_text, segments=dummy_segments)
src/mstt_mycustom/register.py:
from mstt import register_model
from .models import MyCustomSTTModel
@register_model("mycustom")
def register_mycustom_model():
return MyCustomSTTModel
pyproject.toml (important for plugin discovery):
[project.entry-points.mstt]
mycustom = "mstt_mycustom.register"
Installing Your Plugin
After setting up your plugin package, you can install it in editable mode for development:
pip install -e /path/to/your/mstt_mycustom
Or, if you package it, install it like any other Python package:
pip install mstt-mycustom
Once installed, MSTT will automatically discover and load your mycustom model, and you can use it like any other built-in model:
from mstt import MSTT
mstt = MSTT(model_id="mycustom")
result = mstt.transcribe("path/to/your/audio.wav")
print(result.text)
Contributing
We welcome contributions to MSTT! If you'd like to contribute, please follow these steps:
- Fork the repository.
- Create a new branch (
git checkout -b feature/your-feature-name). - Make your changes.
- Write and run tests (
pytest). - Commit your changes (
git commit -am 'Add new feature'). - Push to the branch (
git push origin feature/your-feature-name). - Create a new Pull Request.
License
This project is licensed under the Apache License, Version 2.0 - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mstt-0.2.0.tar.gz.
File metadata
- Download URL: mstt-0.2.0.tar.gz
- Upload date:
- Size: 6.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ebb97872dff1a0a78675a9141721c1d665a50739705ca6e5c52cabe90a0cbd46
|
|
| MD5 |
4c9259721d07f2d45be546da65794e9e
|
|
| BLAKE2b-256 |
a1b77851305458e1407c41a4dd37c3c3783ce7388a41a3e2a88e01a501b9107e
|
File details
Details for the file mstt-0.2.0-py3-none-any.whl.
File metadata
- Download URL: mstt-0.2.0-py3-none-any.whl
- Upload date:
- Size: 8.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
95b8dc2284e79334cfdce471144cb0e1f06f2370320c93d108e789a427b7d2ee
|
|
| MD5 |
6c348044c7c2a630b2f5ecf9437372ae
|
|
| BLAKE2b-256 |
2d5321eafe544bac82236d40215ca7b13a00eb85d6a3d5efb2c269a9b92e865c
|