ez audio transcription tool with flexible processing and post-processing options

These details have not been verified by PyPI

Project links

Homepage

Project description

wscribe

wscribe

Getting started

wscribe is yet another easy to use front-end for whisper specifically for transcription. It aims to be modular so that it can support multiple audio sources, processing backends and inference interfaces. It can run both on CPU and GPU based on the processing backend. Once transcript is generated, editing/correction/visualization of the transcript can be done manually with the wscribe-editor.

It was created at sochara because we have a large volume of audio recordings that need to be transcribed and eventually archived. Another important need was that we needed to verify and manually edit the generated transcript, I could not find any open-source tool that checked all the boxes. Suggested workflow is generating word-level transcript(only supported in json export) and then editing the transcript with the wscribe-editor.

Currently, it supports the following. Check roadmap for upcoming support.

Processing backend: faster-whisper
Audio sources: Local files (Audio/Video)
Inference interfaces: Python CLI
File exports: JSON, SRT, WebVTT

Installation

These instructions were tested on NixOS:Python3.10 and ArchLinux:Python3.10 but should work for any other OS, if you face any installation issues please feel free to create issues. I’ll try to put out a docker image sometime.

1. Set required env var
- WSCRIBE_MODELS_DIR : Path to the directory where whisper models should be downloaded to.
  
  export WSCRIBE_MODELS_DIR=$XDG_DATA_HOME/whisper-models # example
1. Download the models
2. Recommended
  - Recommended way for downloading the models is to use the helper script, it’ll download the models to WSCRIBE_MODELS_DIR.
```
cd /tmp # temporary script, only needed to download the models
curl https://raw.githubusercontent.com/geekodour/wscribe/main/scripts/fw_dw_hf_wo_lfs.sh
chmod u+x fw_dw_hf_wo_lfs.sh
./fw_dw_hf_wo_lfs.sh tiny # other models: tiny, small, medium and large-v2
```
3. Manual
  
  You can download the models directly from here using git lfs, make sure you download/copy them to WSCRIBE_MODELS_DIR
1. Install wscribe
Assuming you already have a working python>=3.10 setup
```
pip install wscribe
```

Usage

# wscribe transcribe [OPTIONS] SOURCE DESTINATION

# cpu
wscribe transcribe audio.mp3 transcription.json
# use gpu
wscribe transcribe video.mp4 transcription.json --gpu
# use gpu, srt format
wscribe transcribe video.mp4 transcription.srt -g -f srt
# use gpu, srt format, tiny model
wscribe transcribe video.mp4 transcription.vtt -g -f vtt -m tiny
wscribe transcribe --help # all help info

Numbers

These numbers are from machine under normal web-browsing workload running on a single RTX3050
Audio conversion takes around 1s
Also check this explanation about the speed difference.

device	quant	model	original playback	transcription	playback/transcription
cuda	float16	tiny	6.3m	0.1m	68x
cuda	float16	small	6.3m	0.2m	29x
cuda	float16	medium	6.3m	0.4m	14x
cuda	float16	large-v2	6.3m	0.8m	7x
cpu	int8	tiny	6.3m	0.2m	25x
cpu	int8	small	6.3m	1.3m	4x
cpu	int8	medium	6.3m	3.6m	~1.7x
cpu	int8	large-v2	6.3m	3.6m	~0.9x

Roadmap

Processing Backends

faster-whisper
whisper.cpp
WhisperX, diarization is something to look forward to

Transcription Features

Add support for diarization
Add translation
Add VAD/other de-noising stuff etc.
Add local llm integration with llama.cpp or something similar for summary and othe possible things. It can be also used to generate more accurate transcript. Whisper mostly generates sort of a subtitle, for converting subtitle into transcription we need to group the subtitle. This can be done in various ways. Eg. By speaker if diarization is supported, by time chunks etc. By using LLMs or maybe other NLP techniques we’ll also be able to do this with things like break in dialogue etc. Have to explore.

Inference interfaces

[-] Python CLI
- Basic CLI
- Improve summary statistics
REST endpoint
- Basic server to run wscribe via an API.
- Possibly add glue code to expose it via CFtunnels or something similar

Audio sources

Local files
Youtube
Google drive

Distribution

Python packaging
Docker/Podman
Package for Nix
Package for Arch(AUR)

Contributing

All contribution happens through PRs, any contributions is greatly appreciated, bugfixes are welcome, features are welcome, tests are welcome, suggestions & criticism are welcome.

Testing

make test
See other helper commands in Makefile

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.5

Nov 8, 2023

0.1.4

Aug 6, 2023

0.1.3

Aug 5, 2023

0.1.2

Aug 5, 2023

0.1.1

Aug 5, 2023

0.1.0

Jul 25, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wscribe-0.1.5.tar.gz (11.6 kB view details)

Uploaded Nov 8, 2023 Source

Built Distribution

wscribe-0.1.5-py3-none-any.whl (10.9 kB view details)

Uploaded Nov 8, 2023 Python 3

File details

Details for the file wscribe-0.1.5.tar.gz.

File metadata

Download URL: wscribe-0.1.5.tar.gz
Upload date: Nov 8, 2023
Size: 11.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.4.2 CPython/3.10.11 Linux/6.1.35

File hashes

Hashes for wscribe-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`345a269d1c9e9915f0cfb0f27c4cb174fdd3708d4b5c3e96a8b9ac7314ccc8ef`
MD5	`ec4e6b6a5ab63f34f57e4a50dd74d03b`
BLAKE2b-256	`3d6790c44d5829cb6559b1604a468a60f0f7a7db3cbe55a3c245ca3bb07e008c`

See more details on using hashes here.

File details

Details for the file wscribe-0.1.5-py3-none-any.whl.

File metadata

Download URL: wscribe-0.1.5-py3-none-any.whl
Upload date: Nov 8, 2023
Size: 10.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.4.2 CPython/3.10.11 Linux/6.1.35

File hashes

Hashes for wscribe-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3bf2d3e1eb756a35ea178c23d95ddfa19290c73a3ed033516e3199dff572729e`
MD5	`9fca434bdbc93502ff4730da65f3112f`
BLAKE2b-256	`f72ed8cfb2a6b2e2afd1d6d6b0c3b744a4fd6a55eec72641d749df57a31a2fdb`

See more details on using hashes here.

wscribe 0.1.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Table of Contents

wscribe

Getting started

Installation

Usage

Numbers

Roadmap

Processing Backends

Transcription Features

Inference interfaces

Audio sources

Distribution

Contributing

Testing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes