Skip to main content

Allow Sparv to import audio as text with KB Whisper

Project description

sparv-sbx-whisper-import

PyPI version PyPI - Python Version PyPI - Downloads

Maturity badge - level 2 Stage

CI(release)

This Sparv plugin makes it possible to use audio files as input to Sparv. The audio is transcribed to text using transformers and the KB Whisper models.

Prerequisites

  • Python 3.11 or higher
  • Sparv
  • ffmpeg installed and available in your PATH

Install

Install in a virtual environment:

pip install sparv-sbx-whisper-import

or if you have installed sparv with pipx:

pipx inject sparv sparv-sbx-whisper-import

or if you have installed sparv with uv-pipx:

uvpipx install sparv-sbx-whisper-import --inject sparv

Usage

To use audio files as input to Sparv, first create a corpus and a Sparv configuration file. For more information about creating a corpus, see the Sparv documentation. Possible configuration options are described below.

Once your corpus and configuration file are set up, run Sparv as usual:

sparv run

Supported audio formats

[!NOTE] Only one file type and one importer can be used within a corpus. If you want to process multiple file types, please create separate corpora.

The following audio formats are supported:

Audio format Importer (in config)
MP3 sbx_whisper_import:parse_mp3
OGG sbx_whisper_import:parse_ogg
WAV sbx_whisper_import:parse_wav

Do you miss some audio format? Please check the tracking issue or open a new issue to request support for additional formats.

Command-line interface

You can use this plugin from the command-line as

# Activate virtual environment
> sbx-whisper-import --help
usage: sbx-whisper-import [-h] [--model-size MODEL_SIZE] [--verbosity VERBOSITY] INPUT

Transcribe audio file with KB-Whisper. Output is in JSON.

positional arguments:
  INPUT                 audio input to trancribe in one of the formats MP3, OGG or WAV

options:
  -h, --help            show this help message and exit
  --model-size MODEL_SIZE
                        set the size of the model
  --verbosity VERBOSITY
                        set the verbosity of the model

Configuration

To use this plugin, specify the appropriate importer for your audio files in the Sparv configuration file (config.yaml).

The default model size is small and the default verbosity is standard. You can change these settings as described below.

import:
  text_annotation: text
  # needed to use sbx_whisper_import, use one of the lines below
  importer: sbx_whisper_import:parse_mp3
  # importer: sbx_whisper_import:parse_ogg
  # importer: sbx_whisper_import:parse_wav

sbx_whisper_import:
  # One of "tiny", "base", "small", "medium" or "large"
  model_size: small
  # One of "subtitle", "standard" or "strict" (low verbosity to high verbosity)
  # NOTE: model size "medium" does support the verbosity "subtitle"
  model_verbosity: standard

export:
  annotations:
    - text
    - <token>

Annotations

The following annotations are created by the plugin:

  • text with the attribute source_filename, which indicates the name of the audio file from which the text was transcribed.
  • utterance with the attributes start and end, which indicate the timestamps (in seconds) of the utterance within the audio file.

Sample output:

<?xml version='1.0' encoding='utf-8'?>
<text source_filename="example.mp3">
  <utterance end="6.0" start="0.0">
    <token>Världsförklaring</token>
    <token>.</token>
  </utterance>
</text>

Metadata

The following table lists the exact models and revisions used for each combination of model size and model verbosity.

Model Size Model Verbosity Model used Revision used
tiny subtitle KBLab/kb-whisper-tiny 238d279d9821c32b905fcaff6ce9dad38ad00ab7
tiny standard KBLab/kb-whisper-tiny e2bca57c3eee6144b9fefd07749580034cfa9686
tiny strict KBLab/kb-whisper-tiny ea2a872f41f543aaadea23e185e974d1ab29ba2b
base subtitle KBLab/kb-whisper-base 7a57b541ccf4aebef73ecfdc064ef4b5cab3b02e
base standard KBLab/kb-whisper-base 1ee0facc30bb1f26492bb1360a99d552e25a31c2
base strict KBLab/kb-whisper-base be19431a3fb78b71ac1525bcafe792220b314c9e
small subtitle KBLab/kb-whisper-small 8d49820338edb72829d1c44fa70a2ba94a4a20fa
small standard KBLab/kb-whisper-small 728c681653e2732ff64618e7f607f509ec87472a
small strict KBLab/kb-whisper-small 066ef166dd25b4b27039517ca77af30c1c10688a
medium subtitle NOTE: subtitle not present for kb-whisper-medium -
medium standard KBLab/kb-whisper-medium 32529a74c6662479625746edce7f16fe743fe011
medium strict KBLab/kb-whisper-medium 51990d2cd5d0cf120b3eceb812bc5407a171a220
large subtitle KBLab/kb-whisper-large 50b62f493fa513926007d388f76cce9659bce123
large standard KBLab/kb-whisper-large 9e03cd21c14d02c57c33ae90b5803b54995ff241
large strict KBLab/kb-whisper-large ea0a8ac1cda8eab8777bf8d74440eb7606825d8f

Changelog

This project keeps a changelog.

Minimum supported Python version

This library tries to support as many Python versions as possible. When a Python version is added or dropped, this library's minor version is bumped.

  • v0.1.0: Python 3.11

Development

Development prerequisites

For starting to develop on this repository:

  • Clone the repo git clone https://github.com/spraakbanken/sparv-sbx-whisper-import.git
  • Setup environment: make dev
  • Install pre-commit hooks: pre-commit install

Do your work.

Tasks to do:

  • Test the code with make test or make test-w-coverage.
  • Test the examples with make test-examples.
  • Lint the code with make lint.
  • Check formatting with make check-fmt.
  • Format the code with make fmt.
  • Type-check the code with make type-check.

This repo uses conventional commits.

Release a new version

  • Prepare the CHANGELOG: make prepare-release and then edit CHANGELOG.md.
  • Add to git: git add CHANGELOG.md
  • Commit with git commit -m 'chore(release): prepare release' or cog commit chore 'prepare release' release.
  • Bump version (depends on `bump-my-version)
    • Major: make bumpversion part=major
    • Minor: make bumpversion part=minor
    • Patch: make bumpversion part=patch or make bumpversion
  • Push main and tags to GitHub: git push main --tags or make publish
    • GitHub Actions will build, test and publish the package to PyPi.
  • Add metadata for Språkbanken's resource

License

This repository is licensed under the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparv_sbx_whisper_import-0.1.1.tar.gz (8.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sparv_sbx_whisper_import-0.1.1-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file sparv_sbx_whisper_import-0.1.1.tar.gz.

File metadata

  • Download URL: sparv_sbx_whisper_import-0.1.1.tar.gz
  • Upload date:
  • Size: 8.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sparv_sbx_whisper_import-0.1.1.tar.gz
Algorithm Hash digest
SHA256 78d444b61760434ac5d9301ba808ec738df7b3ba0c3cb0d3b6a72142b9635dc7
MD5 fd9c3423cd754fc9c0935c58f8705e35
BLAKE2b-256 a4128d5f756a292dde3d90cb96ec08c971d987399deb62db13ada75f6f4d2627

See more details on using hashes here.

Provenance

The following attestation bundles were made for sparv_sbx_whisper_import-0.1.1.tar.gz:

Publisher: release.yml on spraakbanken/sparv-sbx-whisper-import

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sparv_sbx_whisper_import-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for sparv_sbx_whisper_import-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b8983200e21ef08533fae7afa01f4f2fdf5f21174621cb4663255529c9d90bd1
MD5 b99f91dae290ed4521d92041f09a7ebd
BLAKE2b-256 d2e2f7decb4844cfd79c3d52a7e016ecd41ab0f5574b3934b146324454331043

See more details on using hashes here.

Provenance

The following attestation bundles were made for sparv_sbx_whisper_import-0.1.1-py3-none-any.whl:

Publisher: release.yml on spraakbanken/sparv-sbx-whisper-import

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page