Skip to main content

Simple audio transcription tool using Whisper

Project description

* wscribe
** Getting started
*** Installation
Currently only tested on Linux, if you face any installation issues please feel free to [[https://github.com/geekodour/wscribe/issues][create issues]].
**** Set the required environment variables
- Set ~WSCRIBE_MODELS_DIR~ : Path to the directory where whisper models should be downloaded to
#+begin_src bash
export WSCRIBE_MODELS_DIR=$XDG_DATA_HOME/whisper-models # example
#+end_src
**** Download the models
- You can download the models directly [[https://huggingface.co/guillaumekln][from here]] using ~git lfs~, make sure you download/copy them to ~WSCRIBE_MODELS_DIR~
- Otherwise, you can just use the helper script at [[https://github.com/geekodour/wscribe/blob/main/scripts/fw_dw_hf_wo_lfs.sh][scripts/fw_dw_hf_wo_lfs.sh]], just download it and execute it as per its instructions.
**** Install package
Assuming you already have a working python setup
#+begin_src shell
pip install wscribe
#+end_src
** Usage
#+begin_src
Usage: wscribe transcribe [OPTIONS] SOURCE DESTINATION

Transcribes SOURCE to DESTINATION. Where SOURCE can be local path to an
audio file and DESTINATION needs to be a local path to a non-existing file

Options:
-f, --format [json] destication file format, currently only json
is supported [default: json]
-m, --model [small|medium|large-v2]
model should already be downloaded
[default: medium]
-g, --gpu enable gpu, disabled by default
-d, --debug show debug logs
--help Show this message and exit.
#+end_src
#+begin_src shell
wscribe transcribe audio.mp3 transcription.json # cpu
wscribe transcribe video.mp4 transcription.json --gpu # use gpu
#+end_src
** Contributing
All contribution happens through PRs, any contributions is greatly appreciated, bugfixes are welcome, features are welcome, tests are welcome, suggestions & criticism are welcome.
** Roadmap
- [-] Backends/Features
- [X] faster-whisper
- [ ] whisper.cpp
- [ ] Add support for [[https://github.com/guillaumekln/faster-whisper/issues/303][diarization]]
- [ ] Add VAD/other de-noising stuff etc.
- [ ] Other GPU backends other than CUDA?
- [-] Inference UI
- [X] CLI
- [ ] statistics summary? time taken, playback speed vs transcription speed etc.
- [ ] REST API
- [ ] Streamlit UI
- [ ] Would be nice to compare output of multiple models next to each other
- [ ] Editor UI
- [ ] Web based offline editor
- [ ] SRT and JSON editor
- [ ] Play audio(2x/3x) and it'll highlight current text which can be edited
- [ ] With wscribe JSON export, you'd also have the confidence score for each word color coded
- [-] Audio Source
- [X] Local files
- [ ] Youtube link
- [ ] Google drive link
- [-] Distribution
- [X] Python packaging
- [ ] Windows support(?)
- [ ] Package for Nix
- [ ] Package for Arch(AUR)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wscribe-0.1.0.tar.gz (6.0 kB view hashes)

Uploaded Source

Built Distribution

wscribe-0.1.0-py3-none-any.whl (7.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page