Allow Sparv to import audio as text with KB Whisper

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- POSIX
- Unix
Programming Language
Topic
- Utilities

Project description

sparv-sbx-whisper-import

PyPI - Python Version

This Sparv plugin makes it possible to use audio files as input to Sparv. The audio is transcribed to text using transformers and the KB Whisper models.

Prerequisites

Python 3.11 or higher
Sparv
ffmpeg installed and available in your PATH

Install

Install in a virtual environment:

pip install sparv-sbx-whisper-import

or if you have installed sparv with pipx:

pipx inject sparv sparv-sbx-whisper-import

or if you have installed sparv with uv-pipx:

uvpipx install sparv-sbx-whisper-import --inject sparv

Usage

To use audio files as input to Sparv, first create a corpus and a Sparv configuration file. For more information about creating a corpus, see the Sparv documentation. Possible configuration options are described below.

Once your corpus and configuration file are set up, run Sparv as usual:

sparv run

Supported audio formats

[!NOTE] Only one file type and one importer can be used within a corpus. If you want to process multiple file types, please create separate corpora.

The following audio formats are supported:

Audio format	Importer (in config)
MP3	`sbx_whisper_import:parse_mp3`
OGG	`sbx_whisper_import:parse_ogg`
WAV	`sbx_whisper_import:parse_wav`

Do you miss some audio format? Please check the tracking issue or open a new issue to request support for additional formats.

Command-line interface

You can use this plugin from the command-line as

# Activate virtual environment
> sbx-whisper-import --help
usage: sbx-whisper-import [-h] [--model-size MODEL_SIZE] [--verbosity VERBOSITY] INPUT

Transcribe audio file with KB-Whisper. Output is in JSON.

positional arguments:
  INPUT                 audio input to trancribe in one of the formats MP3, OGG or WAV

options:
  -h, --help            show this help message and exit
  --model-size MODEL_SIZE
                        set the size of the model
  --verbosity VERBOSITY
                        set the verbosity of the model

Configuration

To use this plugin, specify the appropriate importer for your audio files in the Sparv configuration file (config.yaml).

The default model size is small and the default verbosity is standard. You can change these settings as described below.

import:
  text_annotation: text
  # needed to use sbx_whisper_import, use one of the lines below
  importer: sbx_whisper_import:parse_mp3
  # importer: sbx_whisper_import:parse_ogg
  # importer: sbx_whisper_import:parse_wav

sbx_whisper_import:
  # One of "tiny", "base", "small", "medium" or "large"
  model_size: small
  # One of "subtitle", "standard" or "strict" (low verbosity to high verbosity)
  # NOTE: model size "medium" does support the verbosity "subtitle"
  model_verbosity: standard

export:
  annotations:
    - text
    - <token>

Annotations

The following annotations are created by the plugin:

text with the attribute source_filename, which indicates the name of the audio file from which the text was transcribed.
utterance with the attributes start and end, which indicate the timestamps (in seconds) of the utterance within the audio file.

Sample output:

<?xml version='1.0' encoding='utf-8'?>
<text source_filename="example.mp3">
  <utterance end="6.0" start="0.0">
    <token>Världsförklaring</token>
    <token>.</token>
  </utterance>
</text>

Metadata

The following table lists the exact models and revisions used for each combination of model size and model verbosity.

Model Size	Model Verbosity	Model used	Revision used
`tiny`	`subtitle`	KBLab/kb-whisper-tiny	`238d279d9821c32b905fcaff6ce9dad38ad00ab7`
`tiny`	`standard`	KBLab/kb-whisper-tiny	`e2bca57c3eee6144b9fefd07749580034cfa9686`
`tiny`	`strict`	KBLab/kb-whisper-tiny	`ea2a872f41f543aaadea23e185e974d1ab29ba2b`
`base`	`subtitle`	KBLab/kb-whisper-base	`7a57b541ccf4aebef73ecfdc064ef4b5cab3b02e`
`base`	`standard`	KBLab/kb-whisper-base	`1ee0facc30bb1f26492bb1360a99d552e25a31c2`
`base`	`strict`	KBLab/kb-whisper-base	`be19431a3fb78b71ac1525bcafe792220b314c9e`
`small`	`subtitle`	KBLab/kb-whisper-small	`8d49820338edb72829d1c44fa70a2ba94a4a20fa`
`small`	`standard`	KBLab/kb-whisper-small	`728c681653e2732ff64618e7f607f509ec87472a`
`small`	`strict`	KBLab/kb-whisper-small	`066ef166dd25b4b27039517ca77af30c1c10688a`
`medium`	`subtitle`	NOTE: subtitle not present for kb-whisper-medium	-
`medium`	`standard`	KBLab/kb-whisper-medium	`32529a74c6662479625746edce7f16fe743fe011`
`medium`	`strict`	KBLab/kb-whisper-medium	`51990d2cd5d0cf120b3eceb812bc5407a171a220`
`large`	`subtitle`	KBLab/kb-whisper-large	`50b62f493fa513926007d388f76cce9659bce123`
`large`	`standard`	KBLab/kb-whisper-large	`9e03cd21c14d02c57c33ae90b5803b54995ff241`
`large`	`strict`	KBLab/kb-whisper-large	`ea0a8ac1cda8eab8777bf8d74440eb7606825d8f`

Changelog

This project keeps a changelog.

Minimum supported Python version

This library tries to support as many Python versions as possible. When a Python version is added or dropped, this library's minor version is bumped.

v0.1.0: Python 3.11

Development

Development prerequisites

For starting to develop on this repository:

Clone the repo git clone https://github.com/spraakbanken/sparv-sbx-whisper-import.git
Setup environment: make dev
Install pre-commit hooks: pre-commit install

Do your work.

Tasks to do:

Test the code with make test or make test-w-coverage.
Test the examples with make test-examples.
Lint the code with make lint.
Check formatting with make check-fmt.
Format the code with make fmt.
Type-check the code with make type-check.

This repo uses conventional commits.

Release a new version

Prepare the CHANGELOG: make prepare-release and then edit CHANGELOG.md.
Add to git: git add CHANGELOG.md
Commit with git commit -m 'chore(release): prepare release' or cog commit chore 'prepare release' release.
Bump version (depends on `bump-my-version)
- Major: make bumpversion part=major
- Minor: make bumpversion part=minor
- Patch: make bumpversion part=patch or make bumpversion
Push main and tags to GitHub: git push main --tags or make publish
- GitHub Actions will build, test and publish the package to PyPi.
Add metadata for Språkbanken's resource
- Generate metadata: make generate-metadata
- Upload the files from examples/metadata/export/sbx_metadata/utility to https://github.com/spraakbanken/metadata/tree/main/yaml/utility.

License

This repository is licensed under the MIT license.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- POSIX
- Unix
Programming Language
Topic
- Utilities

Release history Release notifications | RSS feed

0.2.0

Jan 21, 2026

This version

0.1.1

Nov 20, 2025

0.1.0

Sep 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparv_sbx_whisper_import-0.1.1.tar.gz (8.3 kB view details)

Uploaded Nov 20, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sparv_sbx_whisper_import-0.1.1-py3-none-any.whl (11.0 kB view details)

Uploaded Nov 20, 2025 Python 3

File details

Details for the file sparv_sbx_whisper_import-0.1.1.tar.gz.

File metadata

Download URL: sparv_sbx_whisper_import-0.1.1.tar.gz
Upload date: Nov 20, 2025
Size: 8.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sparv_sbx_whisper_import-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`78d444b61760434ac5d9301ba808ec738df7b3ba0c3cb0d3b6a72142b9635dc7`
MD5	`fd9c3423cd754fc9c0935c58f8705e35`
BLAKE2b-256	`a4128d5f756a292dde3d90cb96ec08c971d987399deb62db13ada75f6f4d2627`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sparv_sbx_whisper_import-0.1.1.tar.gz:

Publisher: release.yml on spraakbanken/sparv-sbx-whisper-import

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sparv_sbx_whisper_import-0.1.1.tar.gz
- Subject digest: 78d444b61760434ac5d9301ba808ec738df7b3ba0c3cb0d3b6a72142b9635dc7
- Sigstore transparency entry: 709832322
- Sigstore integration time: Nov 20, 2025
Source repository:
- Permalink: spraakbanken/sparv-sbx-whisper-import@604bdc0d1fd2a32fd721a85297936a6b79a2ad17
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/spraakbanken
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@604bdc0d1fd2a32fd721a85297936a6b79a2ad17
- Trigger Event: push

File details

Details for the file sparv_sbx_whisper_import-0.1.1-py3-none-any.whl.

File metadata

Download URL: sparv_sbx_whisper_import-0.1.1-py3-none-any.whl
Upload date: Nov 20, 2025
Size: 11.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sparv_sbx_whisper_import-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b8983200e21ef08533fae7afa01f4f2fdf5f21174621cb4663255529c9d90bd1`
MD5	`b99f91dae290ed4521d92041f09a7ebd`
BLAKE2b-256	`d2e2f7decb4844cfd79c3d52a7e016ecd41ab0f5574b3934b146324454331043`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sparv_sbx_whisper_import-0.1.1-py3-none-any.whl:

Publisher: release.yml on spraakbanken/sparv-sbx-whisper-import

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sparv_sbx_whisper_import-0.1.1-py3-none-any.whl
- Subject digest: b8983200e21ef08533fae7afa01f4f2fdf5f21174621cb4663255529c9d90bd1
- Sigstore transparency entry: 709832325
- Sigstore integration time: Nov 20, 2025
Source repository:
- Permalink: spraakbanken/sparv-sbx-whisper-import@604bdc0d1fd2a32fd721a85297936a6b79a2ad17
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/spraakbanken
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@604bdc0d1fd2a32fd721a85297936a6b79a2ad17
- Trigger Event: push

sparv-sbx-whisper-import 0.1.1

Navigation

Verified details

Project links

Owner

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

sparv-sbx-whisper-import

Prerequisites

Install

Usage

Supported audio formats

Command-line interface

Configuration

Annotations

Metadata

Changelog

Minimum supported Python version

Development

Development prerequisites

Release a new version

License

Project details

Verified details

Project links

Owner

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance