Skip to main content

ez audio transcription tool with flexible processing and post-processing options

Project description

Table of Contents

  1. wscribe
    1. Getting started
      1. Installation
      2. Usage
    2. Numbers
    3. Roadmap
      1. Processing Backends
      2. Transcription Features
      3. Inference interfaces
      4. Audio sources
      5. Distribution
    4. Contributing
      1. Testing

wscribe

Getting started

wscribe is yet another easy to use front-end for whisper specifically for transcription. It aims to be modular so that it can support multiple audio sources, processing backends and inference interfaces. It can run both on CPU and GPU based on the processing backend. Once transcript is generated, editing/correction/visualization of the transcript can be done manually with the wscribe-editor.

It was created at sochara because we have a large volume of audio recordings that need to be transcribed and eventually archived. Another important need was that we needed to verify and manually edit the generated transcript, I could not find any open-source tool that checked all the boxes. Suggested workflow is generating word-level transcript(only supported in json export) and then editing the transcript with the wscribe-editor.

Currently, it supports the following. Check roadmap for upcoming support.

  • Processing backend: faster-whisper
  • Audio sources: Local files (Audio/Video)
  • Inference interfaces: Python CLI
  • File exports: JSON, SRT, WebVTT

Installation

These instructions were tested on NixOS:Python3.10 and ArchLinux:Python3.10 but should work for any other OS, if you face any installation issues please feel free to create issues. I’ll try to put out a docker image sometime.

    1. Set required env var
    • WSCRIBE_MODELS_DIR : Path to the directory where whisper models should be downloaded to.

      export WSCRIBE_MODELS_DIR=$XDG_DATA_HOME/whisper-models # example

    1. Download the models

    2. Recommended

      • Recommended way for downloading the models is to use the helper script, it’ll download the models to WSCRIBE_MODELS_DIR.

        cd /tmp # temporary script, only needed to download the models
        curl https://raw.githubusercontent.com/geekodour/wscribe/main/scripts/fw_dw_hf_wo_lfs.sh
        chmod u+x fw_dw_hf_wo_lfs.sh
        ./fw_dw_hf_wo_lfs.sh tiny # other models: tiny, small, medium and large-v2
        
    3. Manual

      You can download the models directly from here using git lfs, make sure you download/copy them to WSCRIBE_MODELS_DIR

    1. Install wscribe

    Assuming you already have a working python>=3.10 setup

    pip install wscribe
    

Usage

# wscribe transcribe [OPTIONS] SOURCE DESTINATION

# cpu
wscribe transcribe audio.mp3 transcription.json
# use gpu
wscribe transcribe video.mp4 transcription.json --gpu
# use gpu, srt format
wscribe transcribe video.mp4 transcription.srt -g -f srt
# use gpu, srt format, tiny model
wscribe transcribe video.mp4 transcription.vtt -g -f vtt -m tiny
wscribe transcribe --help # all help info

Numbers

device quant model original playback transcription playback/transcription
cuda float16 tiny 6.3m 0.1m 68x
cuda float16 small 6.3m 0.2m 29x
cuda float16 medium 6.3m 0.4m 14x
cuda float16 large-v2 6.3m 0.8m 7x
cpu int8 tiny 6.3m 0.2m 25x
cpu int8 small 6.3m 1.3m 4x
cpu int8 medium 6.3m 3.6m ~1.7x
cpu int8 large-v2 6.3m 3.6m ~0.9x

Roadmap

Processing Backends

Transcription Features

  • Add support for diarization
  • Add translation
  • Add VAD/other de-noising stuff etc.
  • Add local llm integration with llama.cpp or something similar for summary and othe possible things. It can be also used to generate more accurate transcript. Whisper mostly generates sort of a subtitle, for converting subtitle into transcription we need to group the subtitle. This can be done in various ways. Eg. By speaker if diarization is supported, by time chunks etc. By using LLMs or maybe other NLP techniques we’ll also be able to do this with things like break in dialogue etc. Have to explore.

Inference interfaces

  • [-] Python CLI
    • Basic CLI
    • Improve summary statistics
  • REST endpoint
    • Basic server to run wscribe via an API.
    • Possibly add glue code to expose it via CFtunnels or something similar

Audio sources

  • Local files
  • Youtube
  • Google drive

Distribution

  • Python packaging
  • Docker/Podman
  • Package for Nix
  • Package for Arch(AUR)

Contributing

All contribution happens through PRs, any contributions is greatly appreciated, bugfixes are welcome, features are welcome, tests are welcome, suggestions & criticism are welcome.

Testing

  • make test
  • See other helper commands in Makefile

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wscribe-0.1.5.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

wscribe-0.1.5-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file wscribe-0.1.5.tar.gz.

File metadata

  • Download URL: wscribe-0.1.5.tar.gz
  • Upload date:
  • Size: 11.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.10.11 Linux/6.1.35

File hashes

Hashes for wscribe-0.1.5.tar.gz
Algorithm Hash digest
SHA256 345a269d1c9e9915f0cfb0f27c4cb174fdd3708d4b5c3e96a8b9ac7314ccc8ef
MD5 ec4e6b6a5ab63f34f57e4a50dd74d03b
BLAKE2b-256 3d6790c44d5829cb6559b1604a468a60f0f7a7db3cbe55a3c245ca3bb07e008c

See more details on using hashes here.

File details

Details for the file wscribe-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: wscribe-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 10.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.10.11 Linux/6.1.35

File hashes

Hashes for wscribe-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 3bf2d3e1eb756a35ea178c23d95ddfa19290c73a3ed033516e3199dff572729e
MD5 9fca434bdbc93502ff4730da65f3112f
BLAKE2b-256 f72ed8cfb2a6b2e2afd1d6d6b0c3b744a4fd6a55eec72641d749df57a31a2fdb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page