Speech recording application for creating high-quality speech datasets
Project description
Revoxx - Record Voices
This repository provides Revoxx, a graphical recording application for recording raw speech and generating datasets.
Overview
Revoxx has been created by Grammatek ehf and is part of the Icelandic Language Technology Programme.
- Category: TTS
- Domain: Laptop/Workstation
- Languages: Python
- Language Version/Dialect:
- Python: 3.9, 3.10, 3.11
- Audience: Developers, Researchers
- Origins: Icelandic EmoSpeech scripts
Status
System Requirements
- Operating System: Linux/OS-X, should work on Windows
- Recording: Audio Interface, good voice microphone and headphones
Description
Revoxx is a graphical speech recorder specialized in recording TTS datasets quickly and reliably.
You can use this project to create emotional / non-emotional voice recordings on a Workstation / Laptop with suitable audio equipment.
It has integrated support to easily transform raw recordings into datasets for training TTS voice models.
This tool is especially useful for recording many short utterances - up to an utterance duration of approx. 30-45 secs each.
For longer texts, you need to split your input texts in appropriately sized chunks that would fit on the speaker screen.
Revoxx has been inspired by Icelandic EmoSpeech scripts, but has been vastly improved and is rewritten from scratch.
Screenshot:
We have condensed our experience from when we recorded Talrómur 3, the Icelandic emotional speech dataset, and created this tool to minimize hassle, valuable recording & post-processing time.
- Revoxx makes recording of speech fast, reliable and convenient for the recording engineer and the voice talent
- Integrates all necessary tools to check if recordings & equipment meet your expected requirements
- Automatically analyzes and validates audio equipment compatibility, including Sample Rate, Bit Depth, and I/O channel configurations
- Supports unlimited re-recording while maintaining a complete archive of raw recordings, even for deleted content
- Text size is automatically adjusted according to available screen real-estate
- Intuitive keyboard shortcuts for accessing core functionalities
- Recordings are organized into Recording Sessions
- Record emotional sessions for each speaker or record more traditional LJSpeech-style sessions
- Seamless transitions between different recording sessions with automatic progress tracking: continue where you left-off
- Offers advanced search and navigation capabilities for utterances, with flexible sorting by label, emotion, text content, and recorded takes
- Consistent audio settings & metadata for all recordings
- Real-time monitoring including toggable recording levels, mel spectrograms, maximum frequency detection, and more
- Customizable industry-standard presets for Peak/RMS levels
- Dedicated Monitoring mode for precise input calibration
- Multi-Screen Support
- You can use multiple monitors to separate recording view from speaker view
- We support Apple's "Continuity" feature for a convenient dual screen setup with an external iPad
- Each screen appearance can be individually configured
- All screen layouts, placement & configuration is preserved at exit
- Export Dataset
- Facilitates batch export of multiple sessions into T3 (Talrómur3) dataset format
- Groups different recording sessions of the same speaker into a common dataset
Installation
Basic Installation
Using uv
uv is a fast Python package installer and resolver:
uv pip install revoxx # From PyPI
uv pip install . # From source
uv pip install revoxx[vad] # With VAD support
Using pip
pip install revoxx # From PyPI
pip install . # From source
pip install revoxx[vad] # With VAD support
From source
git clone https://github.com/icelandic-lt/revoxx.git
cd revoxx
# Then use either uv or pip as shown above
With Voice Activity Detection (VAD)
The VAD functionality requires PyTorch (~2GB). Install it separately if needed:
uv pip install revoxx[vad] # Using uv
# or
pip install revoxx[vad] # Using pip
Development Setup
For development
Using uv (recommended - faster)
git clone https://github.com/icelandic-lt/revoxx.git
cd revoxx
# Create and activate virtual environment
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install in editable mode with dev dependencies
uv pip install -e .[dev]
# With VAD support:
uv pip install -e .[dev,vad]
Using pip (traditional)
git clone https://github.com/icelandic-lt/revoxx.git
cd revoxx
# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in editable mode with dev dependencies
pip install -e .[dev]
# With VAD support:
pip install -e .[dev,vad]
Development dependencies include:
- black: Code formatter
- isort: Import statement organizer
- flake8: Code linter
- pytest: Testing framework
- pytest-cov: Code coverage reporting
Running code quality checks
# Format code
black revoxx/ scripts_module/ tests/
# Check code style
flake8 revoxx/ scripts_module/ tests/
# Run tests
pytest tests/
# Run tests with coverage
pytest tests/ --cov=revoxx --cov-report=html
Running Revoxx
After installation
Once installed, you can run Revoxx using:
revoxx
During development (without installation)
Run as a Python module:
python -m revoxx
In PyCharm or other IDEs
Configure your run configuration with:
- Module name:
revoxx(not script path) - Working directory: Project root directory
Command-line tools
The package includes additional utilities:
revoxx-export # Export sessions to dataset format
revoxx-vadiate # Voice Activity Detection tool (requires [vad] option)
Note: The revoxx-vadiate tool requires the VAD dependencies. Install with pip install revoxx[vad] or pip install .[vad] to use this tool.
Command-line arguments
revoxx --help # Show all available options
revoxx --show-devices # List available audio devices
revoxx --session path/to/session # Open specific session
Prepare recordings
Before you start recording, you should prepare a script with the utterances you want to record. The script should be a simple text file with one utterance per line. The utterances can be in any language you want.
A script file follows Festival-style and has the following possible two formats:
For a script with emotion levels:
( <unique id> "<emotion-level>: <utterance>" )
For a script without emotion levels. This format was used for recording our non-emotional "addendas":
( <unique id> "<utterance>" )
You can see for both formats an example in the directory scripts.
The emotion levels can be from any monotonic numerical value range you want. If you want to follow Talrómur 3 dataset conventions, you can use emotion levels 0-5 for 6 emotions: neutral, happy, sad, angry, surprised, and helpful. The emotion levels are used to control the emotion intensity of the speech in combination with the specific emotion. Neutral speech corresponds to emotion level 0.
Record dataset
to be defined
Acknowledgements
This project is part of the program Language Technology for Icelandic. The program was funded by the Icelandic Ministry of Culture and Business Affairs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file revoxx-1.0.0.dev5.tar.gz.
File metadata
- Download URL: revoxx-1.0.0.dev5.tar.gz
- Upload date:
- Size: 361.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
13134531e5a298d6b928e37978e459d42675d848c52d9524f71e1e3cb14d391b
|
|
| MD5 |
ef3ddae15a05369476d3cffa85f5633c
|
|
| BLAKE2b-256 |
4c1e5d029228a7656851b8b2f260b3a479416188c32c09fc5a04f593c68a431d
|
File details
Details for the file revoxx-1.0.0.dev5-py3-none-any.whl.
File metadata
- Download URL: revoxx-1.0.0.dev5-py3-none-any.whl
- Upload date:
- Size: 360.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d3dc0660c523a542c41963895c560fe7b3c47caa92a68769b31d2e1ffa994a3c
|
|
| MD5 |
deee6a212223ecd2979c1ff80a541f0c
|
|
| BLAKE2b-256 |
3b82fcc19a6ac90ff814a2f42bc3f6ef6cef30c2a0988dfa9873db953920f8f3
|