Allow Sparv to import audio as text with KB Whisper

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- POSIX
- Unix
Programming Language
Topic
- Utilities

Project description

sparv-sbx-whisper-import

PyPI - Python Version

This Sparv plugin makes it possible to use audio files as input to Sparv. The audio is transcribed to text using transformers and the KB Whisper models.

Prerequisites

Python 3.11 or higher
Sparv
ffmpeg installed and available in your PATH

Install

Install in a virtual environment:

pip install sparv-sbx-whisper-import

or if you have installed sparv with pipx:

pipx inject sparv sparv-sbx-whisper-import

or if you have installed sparv with uv-pipx:

uvpipx install sparv-sbx-whisper-import --inject sparv

Usage

To use audio files as input to Sparv, first create a corpus and a Sparv configuration file. For more information about creating a corpus, see the Sparv documentation. Possible configuration options are described below.

Once your corpus and configuration file are set up, run Sparv as usual:

sparv run

Supported audio formats

[!NOTE] Only one file type and one importer can be used within a corpus. If you want to process multiple file types, please create separate corpora.

The following audio formats are supported:

Audio format	Importer (in config)
MP3	`sbx_whisper_import:parse_mp3`
OGG	`sbx_whisper_import:parse_ogg`
WAV	`sbx_whisper_import:parse_wav`

Do you miss some audio format? Please check the tracking issue or open a new issue to request support for additional formats.

Command-line interface

You can use this plugin from the command-line as

# Activate virtual environment
> sbx-whisper-import --help
usage: sbx-whisper-import [-h] [--model-size MODEL_SIZE] [--verbosity VERBOSITY] INPUT

Transcribe audio file with KB-Whisper. Output is in JSON.

positional arguments:
  INPUT                 audio input to trancribe in one of the formats MP3, OGG or WAV

options:
  -h, --help            show this help message and exit
  --model-size MODEL_SIZE
                        set the size of the model
  --verbosity VERBOSITY
                        set the verbosity of the model

Configuration

To use this plugin, specify the appropriate importer for your audio files in the Sparv configuration file (config.yaml).

The default model size is small and the default verbosity is standard. You can change these settings as described below.

import:
  text_annotation: text
  # needed to use sbx_whisper_import, use one of the lines below
  importer: sbx_whisper_import:parse_mp3
  # importer: sbx_whisper_import:parse_ogg
  # importer: sbx_whisper_import:parse_wav

sbx_whisper_import:
  # One of "tiny", "base", "small", "medium" or "large"
  model_size: small
  # One of "subtitle", "standard" or "strict" (low verbosity to high verbosity)
  # NOTE: model size "medium" does support the verbosity "subtitle"
  model_verbosity: standard

export:
  annotations:
    - text
    - <token>

Annotations

The following annotations are created by the plugin:

text with the attribute source_filename, which indicates the name of the audio file from which the text was transcribed.
utterance with the attributes start and end, which indicate the timestamps (in seconds) of the utterance within the audio file.

Sample output:

<?xml version='1.0' encoding='utf-8'?>
<text source_filename="example.mp3">
  <utterance end="6.0" start="0.0">
    <token>Världsförklaring</token>
    <token>.</token>
  </utterance>
</text>

Metadata

The following table lists the exact models and revisions used for each combination of model size and model verbosity.

Model Size	Model Verbosity	Model used	Revision used
`tiny`	`subtitle`	KBLab/kb-whisper-tiny	`238d279d9821c32b905fcaff6ce9dad38ad00ab7`
`tiny`	`standard`	KBLab/kb-whisper-tiny	`e2bca57c3eee6144b9fefd07749580034cfa9686`
`tiny`	`strict`	KBLab/kb-whisper-tiny	`ea2a872f41f543aaadea23e185e974d1ab29ba2b`
`base`	`subtitle`	KBLab/kb-whisper-base	`7a57b541ccf4aebef73ecfdc064ef4b5cab3b02e`
`base`	`standard`	KBLab/kb-whisper-base	`1ee0facc30bb1f26492bb1360a99d552e25a31c2`
`base`	`strict`	KBLab/kb-whisper-base	`be19431a3fb78b71ac1525bcafe792220b314c9e`
`small`	`subtitle`	KBLab/kb-whisper-small	`8d49820338edb72829d1c44fa70a2ba94a4a20fa`
`small`	`standard`	KBLab/kb-whisper-small	`728c681653e2732ff64618e7f607f509ec87472a`
`small`	`strict`	KBLab/kb-whisper-small	`066ef166dd25b4b27039517ca77af30c1c10688a`
`medium`	`subtitle`	NOTE: subtitle not present for kb-whisper-medium	-
`medium`	`standard`	KBLab/kb-whisper-medium	`32529a74c6662479625746edce7f16fe743fe011`
`medium`	`strict`	KBLab/kb-whisper-medium	`51990d2cd5d0cf120b3eceb812bc5407a171a220`
`large`	`subtitle`	KBLab/kb-whisper-large	`50b62f493fa513926007d388f76cce9659bce123`
`large`	`standard`	KBLab/kb-whisper-large	`9e03cd21c14d02c57c33ae90b5803b54995ff241`
`large`	`strict`	KBLab/kb-whisper-large	`ea0a8ac1cda8eab8777bf8d74440eb7606825d8f`

Changelog

This project keeps a changelog.

Minimum supported Python version

This library tries to support as many Python versions as possible. When a Python version is added or dropped, this library's minor version is bumped.

v0.1.0: Python 3.11

Project details

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- POSIX
- Unix
Programming Language
Topic
- Utilities

Release history Release notifications | RSS feed

0.2.0

Jan 21, 2026

0.1.1

Nov 20, 2025

This version

0.1.0

Sep 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparv_sbx_whisper_import-0.1.0.tar.gz (7.5 kB view details)

Uploaded Sep 10, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sparv_sbx_whisper_import-0.1.0-py3-none-any.whl (10.3 kB view details)

Uploaded Sep 10, 2025 Python 3

File details

Details for the file sparv_sbx_whisper_import-0.1.0.tar.gz.

File metadata

Download URL: sparv_sbx_whisper_import-0.1.0.tar.gz
Upload date: Sep 10, 2025
Size: 7.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.15

File hashes

Hashes for sparv_sbx_whisper_import-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`3b624b862a286487c37092e86de89ff30fed57c0208c07b04f8aaf7688ced211`
MD5	`3e516c3ad0895bc848f15b147f7a1ca2`
BLAKE2b-256	`d1056d99735fede3e906ed6bc229cf3ef82a4f014d0375c3d26c80ea0514264e`

See more details on using hashes here.

File details

Details for the file sparv_sbx_whisper_import-0.1.0-py3-none-any.whl.

File metadata

Download URL: sparv_sbx_whisper_import-0.1.0-py3-none-any.whl
Upload date: Sep 10, 2025
Size: 10.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.15

File hashes

Hashes for sparv_sbx_whisper_import-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`066db4f52a279ca57f28d67fbc8aced2e63f523d9e1c962f3cc5d90863cedf5e`
MD5	`77de7afaee01f171f7fe85f81180be2e`
BLAKE2b-256	`6696e8127bba97d1ad573a3c88b54fa465a70e7d940f64c1795b87e87b3da986`

See more details on using hashes here.

sparv-sbx-whisper-import 0.1.0

Navigation

Verified details

Owner

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

sparv-sbx-whisper-import

Prerequisites

Install

Usage

Supported audio formats

Command-line interface

Configuration

Annotations

Metadata

Changelog

Minimum supported Python version

Project details

Verified details

Owner

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes