A speech processing library
Project description
fmus-vox
A speech processing library for Python.
About
fmus-vox is a Python library that provides a rich set of tools for audio processing, speech-to-text (STT), text-to-speech (TTS), voice cloning, wake word detection, and conversational AI.
Features
- Audio Processing: Load, manipulate, and analyze audio with an intuitive interface
- Speech-to-Text: Transcribe speech with support for multiple models (Whisper, Wav2Vec, etc.)
- Text-to-Speech: Synthesize natural-sounding speech with various voices and styles
- Voice Cloning: Create synthetic speech that mimics a specific voice
- Wake Word Detection: Detect custom wake words in audio streams
- Conversational AI: Build voice-driven conversational agents
- Streaming: Real-time audio processing with low latency
- API: Easy integration with web applications
Installation
Basic Installation
pip install fmus-vox
With Specific Features
# For speech-to-text capabilities
pip install fmus-vox[stt]
# For text-to-speech capabilities
pip install fmus-vox[tts]
# For voice cloning
pip install fmus-vox[voice]
# For API server
pip install fmus-vox[api]
# For all features
pip install fmus-vox[full]
Usage Examples
Audio Processing
from fmus_vox import Audio
# Load and process audio
audio = Audio.load("recording.wav")
processed = audio.normalize().denoise().resample(target_sr=16000)
processed.save("processed.wav")
# Record audio
audio = Audio.record(seconds=5)
audio.save("recording.wav")
Speech-to-Text
from fmus_vox import transcribe
# Simple transcription
text = transcribe("recording.wav")
print(f"Transcription: {text}")
# With specific model and language
text = transcribe("recording.wav", model="whisper-large", language="en")
Text-to-Speech
from fmus_vox import speak
# Simple synthesis
speak("Hello, welcome to fmus-vox!", output="welcome.wav")
# With voice styling
from fmus_vox import Speaker
speaker = Speaker(voice="en-female-1")
speaker.set_style("happy").set_speed(1.2).speak("This is exciting!")
Voice Cloning
from fmus_vox import clone_voice
# Clone voice and synthesize speech
clone_voice("my_voice.wav", "Hello with my voice", output="cloned.wav")
# Advanced usage
from fmus_vox import VoiceCloner
cloner = VoiceCloner()
voice_id = cloner.add_reference("my_voice.wav")
audio = cloner.synthesize("Hello with my voice", voice_id)
audio.save("cloned.wav")
Real-time Voice Application
from fmus_vox import VoiceApp
app = VoiceApp()
@app.on_wake("hey assistant")
def wake_handler():
print("Wake word detected!")
return True # Start listening
@app.on_transcribe
def transcribe_handler(text):
print(f"User said: {text}")
if "weather" in text.lower():
return "Today's weather is sunny."
return "I didn't understand that command."
app.run()
License
MIT
Contributing
Contributions are welcome! Please check out our contributing guidelines.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fmus_vox-0.0.1.tar.gz.
File metadata
- Download URL: fmus_vox-0.0.1.tar.gz
- Upload date:
- Size: 103.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
07608df365830b36141c4fcf19d080de3ae9d8963a57a479b4722ea5df4107af
|
|
| MD5 |
9b4022c6fff2f9ab11d89803abe6292b
|
|
| BLAKE2b-256 |
6a00810b1765270faae3d01a9a6173887060b125cadcd92a8d1c914d2ef261ed
|
File details
Details for the file fmus_vox-0.0.1-py3-none-any.whl.
File metadata
- Download URL: fmus_vox-0.0.1-py3-none-any.whl
- Upload date:
- Size: 121.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a776ed268b79dd678a70f326c732c133712d71fd29ba69a0d51b2180286c33f
|
|
| MD5 |
d3e68b253077c6ccb2e0118cede35f0b
|
|
| BLAKE2b-256 |
e8a03069d3c3a1740adeb449c7c8022a5a5bcac59f1d5d39857c07678e5d1179
|