Allow Sparv to import audio as text with KB Whisper
Project description
sparv-sbx-whisper-import
This Sparv plugin makes it possible to use audio files as input to Sparv. The audio is transcribed to text using transformers and the KB Whisper models.
Prerequisites
Install
Install in a virtual environment:
pip install sparv-sbx-whisper-import
or if you have installed sparv with pipx:
pipx inject sparv sparv-sbx-whisper-import
or if you have installed sparv with uv-pipx:
uvpipx install sparv-sbx-whisper-import --inject sparv
Usage
To use audio files as input to Sparv, first create a corpus and a Sparv configuration file. For more information about creating a corpus, see the Sparv documentation. Possible configuration options are described below.
Once your corpus and configuration file are set up, run Sparv as usual:
sparv run
Supported audio formats
[!NOTE] Only one file type and one importer can be used within a corpus. If you want to process multiple file types, please create separate corpora.
The following audio formats are supported:
| Audio format | Importer (in config) |
|---|---|
| MP3 | sbx_whisper_import:parse_mp3 |
| OGG | sbx_whisper_import:parse_ogg |
| WAV | sbx_whisper_import:parse_wav |
Do you miss some audio format? Please check the tracking issue or open a new issue to request support for additional formats.
Command-line interface
You can use this plugin from the command-line as
# Activate virtual environment
> sbx-whisper-import --help
usage: sbx-whisper-import [-h] [--model-size MODEL_SIZE] [--verbosity VERBOSITY] INPUT
Transcribe audio file with KB-Whisper. Output is in JSON.
positional arguments:
INPUT audio input to trancribe in one of the formats MP3, OGG or WAV
options:
-h, --help show this help message and exit
--model-size MODEL_SIZE
set the size of the model
--verbosity VERBOSITY
set the verbosity of the model
Configuration
To use this plugin, specify the appropriate importer for your audio files in the Sparv configuration file (config.yaml).
The default model size is small and the default verbosity is standard. You can change these settings as described below.
import:
text_annotation: text
# needed to use sbx_whisper_import, use one of the lines below
importer: sbx_whisper_import:parse_mp3
# importer: sbx_whisper_import:parse_ogg
# importer: sbx_whisper_import:parse_wav
sbx_whisper_import:
# One of "tiny", "base", "small", "medium" or "large"
model_size: small
# One of "subtitle", "standard" or "strict" (low verbosity to high verbosity)
# NOTE: model size "medium" does support the verbosity "subtitle"
model_verbosity: standard
export:
annotations:
- text
- <token>
Annotations
The following annotations are created by the plugin:
textwith the attributesource_filename, which indicates the name of the audio file from which the text was transcribed.utterancewith the attributesstartandend, which indicate the timestamps (in seconds) of the utterance within the audio file.
Sample output:
<?xml version='1.0' encoding='utf-8'?>
<text source_filename="example.mp3">
<utterance end="6.0" start="0.0">
<token>Världsförklaring</token>
<token>.</token>
</utterance>
</text>
Metadata
The following table lists the exact models and revisions used for each combination of model size and model verbosity.
| Model Size | Model Verbosity | Model used | Revision used |
|---|---|---|---|
tiny |
subtitle |
KBLab/kb-whisper-tiny | 238d279d9821c32b905fcaff6ce9dad38ad00ab7 |
tiny |
standard |
KBLab/kb-whisper-tiny | e2bca57c3eee6144b9fefd07749580034cfa9686 |
tiny |
strict |
KBLab/kb-whisper-tiny | ea2a872f41f543aaadea23e185e974d1ab29ba2b |
base |
subtitle |
KBLab/kb-whisper-base | 7a57b541ccf4aebef73ecfdc064ef4b5cab3b02e |
base |
standard |
KBLab/kb-whisper-base | 1ee0facc30bb1f26492bb1360a99d552e25a31c2 |
base |
strict |
KBLab/kb-whisper-base | be19431a3fb78b71ac1525bcafe792220b314c9e |
small |
subtitle |
KBLab/kb-whisper-small | 8d49820338edb72829d1c44fa70a2ba94a4a20fa |
small |
standard |
KBLab/kb-whisper-small | 728c681653e2732ff64618e7f607f509ec87472a |
small |
strict |
KBLab/kb-whisper-small | 066ef166dd25b4b27039517ca77af30c1c10688a |
medium |
subtitle |
NOTE: subtitle not present for kb-whisper-medium | - |
medium |
standard |
KBLab/kb-whisper-medium | 32529a74c6662479625746edce7f16fe743fe011 |
medium |
strict |
KBLab/kb-whisper-medium | 51990d2cd5d0cf120b3eceb812bc5407a171a220 |
large |
subtitle |
KBLab/kb-whisper-large | 50b62f493fa513926007d388f76cce9659bce123 |
large |
standard |
KBLab/kb-whisper-large | 9e03cd21c14d02c57c33ae90b5803b54995ff241 |
large |
strict |
KBLab/kb-whisper-large | ea0a8ac1cda8eab8777bf8d74440eb7606825d8f |
Changelog
This project keeps a changelog.
Minimum supported Python version
This library tries to support as many Python versions as possible. When a Python version is added or dropped, this library's minor version is bumped.
- v0.1.0: Python 3.11
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sparv_sbx_whisper_import-0.1.0.tar.gz.
File metadata
- Download URL: sparv_sbx_whisper_import-0.1.0.tar.gz
- Upload date:
- Size: 7.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3b624b862a286487c37092e86de89ff30fed57c0208c07b04f8aaf7688ced211
|
|
| MD5 |
3e516c3ad0895bc848f15b147f7a1ca2
|
|
| BLAKE2b-256 |
d1056d99735fede3e906ed6bc229cf3ef82a4f014d0375c3d26c80ea0514264e
|
File details
Details for the file sparv_sbx_whisper_import-0.1.0-py3-none-any.whl.
File metadata
- Download URL: sparv_sbx_whisper_import-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
066db4f52a279ca57f28d67fbc8aced2e63f523d9e1c962f3cc5d90863cedf5e
|
|
| MD5 |
77de7afaee01f171f7fe85f81180be2e
|
|
| BLAKE2b-256 |
6696e8127bba97d1ad573a3c88b54fa465a70e7d940f64c1795b87e87b3da986
|