high quality multi-lingual speech to text

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

gaspardpetit

These details have not been verified by PyPI

Project description

Verbatim

For high quality multilingual speech to text.

Verbatim uses VTTM (YAML with embedded RTTM) as the primary diarization handoff. If you provide an RTTM, it will be wrapped into VTTM internally. Pyannote-based diarization/separation is optional; install with pip install verbatim[diarization] when you need those backends. Senko-based diarization is also optional and is particularly interesting on Apple Silicon when the server keeps the diarizer warm between requests.

Olympus .dss and .ds2 dictation files are supported through the optional pydsscodec backend. Install pip install "verbatim[dss]" when you need those formats.

For a broader side-by-side benchmark on the Air France bilingual sample, including Whisper, Qwen, and Verbatim variants and VibeVoice-ASR, see Air France Comparison.

Installation

Prerequisites

Python version 3.10 to 3.12
Astral UV for development

Portaudio

Portaudio is used on macOS and Linux for accessing the microphone when doing live transcription. To install:

Install on Linux:

 sudo apt install portaudio19-dev

Install on macOS:

 brew install portaudio

Installing

Install from PyPI:

pip install verbatim

Install the latest from git:

pip install git+https://github.com/gaspardpetit/verbatim.git

Optional backend extras:

# Qwen ASR backend
pip install "verbatim[qwen]"

# MMS language identification backend
pip install "verbatim[mms_lid]"

# Qwen ASR + MMS language identification
pip install "verbatim[qwen,mms_lid]"

# MLX Whisper backend on Apple Silicon
pip install "verbatim[mlx]"

# Pyannote diarization / separation
pip install "verbatim[diarization]"

# Olympus DSS / DS2 dictation files
pip install "verbatim[dss]"

# AST-backed non-speech classification for long skipped regions
pip install "verbatim[mms_lid]"

With uv against the local project environment:

uv pip install -e ".[qwen,mms_lid]"

Install Senko separately when you want the optional Senko diarization backend:

uv pip install "git+https://github.com/narcotic-sh/senko.git"

Encrypted DS2 files can be opened with password: in a YAML/JSON config file or the VERBATIM_DSS_PASSWORD environment variable. A --password switch is also available, but it is less safe because it can leak secrets via shell history and process listings.

Recommended:

VERBATIM_DSS_PASSWORD=1234 verbatim recording.ds2 --txt

When using a Senko build that supports in-memory diarization, Verbatim can also run Senko without a working directory by feeding cached 16kHz mono samples directly into the diarizer. This is useful for server deployments that must avoid writing intermediate files.

Install the AST audio classification model dependencies when you want richer labels for long skipped non-speech regions:

uv pip install "verbatim[mms_lid]"

Torch with Cuda Support

If the tool falls back to CPU instead of GPU, you may need to reinstall the torch dependency with Cuda support. Refer to the following instructions: https://pytorch.org/get-started/locally/

HuggingFace Token (optional for diarization)

Verbatim uses VTTM as the diarization handoff. If you opt into pyannote-based diarization/separation (install the diarization extra), the models are gated and require a Hugging Face token:

Create an account on Hugging Face
Request access to the model at https://huggingface.co/pyannote/speaker-diarization-community-1
Request access to the model at https://huggingface.co/pyannote/segmentation-3.0
From your Settings > Access Tokens, generate an access token
Set the HUGGINGFACE_TOKEN environment variable before running diarization. Once models are cached, the token is no longer needed.

Instead of setting HUGGINGFACE_TOKEN in the environment, you may prefer to set the value using a .env file in the current directory like this:

.env

HUGGINGFACE_TOKEN=hf_******

Usage (from terminal)

Simple usage

verbatim audio_file.mp3

Verbose

verbatim audio_file.mp3 -v

Very Verbose

verbatim audio_file.mp3 -vv

Force CPU only

verbatim audio_file.mp3 --cpu

Save file in a specific directory

verbatim audio_file.mp3 -o ./output/

Batch transcription

verbatim-batch --batch-dir ./audio --match "*.wav" "*.mp3" --recursive --skip-existing --txt

Batch transcription including Olympus dictation files

verbatim-batch --batch-dir ./audio --match "*.wav" "*.mp3" "*.dss" "*.ds2" --recursive --skip-existing --txt

Backend selection examples

# Run Qwen ASR explicitly
verbatim audio_file.wav --transcriber-backend qwen --languages en fr

# Run Qwen ASR with MMS language identification
verbatim audio_file.wav --transcriber-backend qwen --language-identifier-backend mms --languages en fr

# On Apple Silicon, Qwen will use MPS automatically unless you force --cpu
verbatim audio_file.wav --transcriber-backend qwen --language-identifier-backend mms --languages en fr -v

# Label long skipped non-speech regions with the optional AST classifier
verbatim audio_file.wav --transcriber-backend qwen --language-identifier-backend mms --languages en fr --non-speech-backend ast -vv

Long-skip review markers

# Default energy-based labels for long skipped regions (for example [SILENCE] or [ENVIRONMENT NOISE])
verbatim audio_file.wav --transcriber-backend qwen --language-identifier-backend mms --languages en fr

# Optional AST-based labels for long skipped regions (for example [MUSIC] when the classifier is confident)
verbatim audio_file.wav --transcriber-backend qwen --language-identifier-backend mms --languages en fr --non-speech-backend ast

Diarization policy examples

# Downmix all channels and run pyannote
verbatim audio_file.wav --diarize "=pyannote"

# Channels 1-2 energy diarization together; channel 3 pyannote; others per-channel labels (auto numbering)
verbatim audio_file.wav --diarize "1,2=energy;3=pyannote;*=channel?speaker=HOST"

# Run Senko diarization
verbatim audio_file.wav --diarize senko

For see the detailed terminal documentation for additional examples and options.

Usage (from Docker)

The tool can also be used within a docker container. This can be particularly convenient, in the context where the audio and transcription is confidential, to ensure that the tool is completely offline since docker using --network none

With GPU support

docker run --network none --shm-size 8G --gpus all \
    -v "/local/path/to/out/:/data/out/" \
    -v "/local/path/to/audio.mp3:/data/audio.mp3" ghcr.io/gaspardpetit/verbatim:latest \
    verbatim /data/audio.mp3 -o /data/out --languages en fr

Without GPU support

docker run --network none \
    -v "/local/path/to/out/:/data/out/" \
    -v "/local/path/to/audio.mp3:/data/audio.mp3" ghcr.io/gaspardpetit/verbatim:latest \
    verbatim /data/audio.mp3 -o /data/out --languages en fr

Usage (from python)

from verbatim import Context, Pipeline
context: Context = Context(
    languages=["en", "fr"],
    nb_speakers=2,
    source_file="audio.mp3",
    out_dir="out")
pipeline: Pipeline = Pipeline(context=context)
pipeline.execute()

The project is organized to be modular, such that individual components can be used outside the full pipeline, and the pipeline can be customized to use custom stages. For example, to use a custom diarization stage:

from verbatim_audio.sources.sourceconfig import SourceConfig
from verbatim_audio.sources.factory import create_audio_source
from verbatim.config import Config

config = Config(lang=["en", "fr"], output_dir="out")
source = create_audio_source(
    input_source="ext/samples/audio/1ch_2spk_en-fr_AirFrance_00h03m54s.wav",
    device="cuda",
    cache=config.cache,
    source_config=SourceConfig(diarize=2),
)

from verbatim.verbatim import Verbatim
verbatim = Verbatim(config=config)

with source.open() as stream:
    for utterance, _unack_utterance, _unconfirmed_word in verbatim.transcribe(audio_stream=stream):
        print(utterance.text)

Contributing

This project aims at finding the best implementation for each stage and glue them together. Contributions with new implementations are welcome.

Refer to the build instructions to learn how to modify and test this project before submitting a pull request.

Architecture

Refer to the architecture details for further information on how Verbatim works.

Objectives

High Quality

Most design decisions in this project favour higher confidence over performance, including multiple passes in several parts to improve analysis. The main motivation for this project was to provide a robust transcription solution that would handle conversations in multiple languages. Most commercial and open-source solutions either expect the user to set the language to be used for the translation or will rely on a short audio sample (ex. the first 10-30 seconds) to detect the language and expect the entire conversation to remain in this language.

In most solution, a change in language during a conversation either results in gibberish or missing text.

By contrast, Verbatim will continuously test for language switching during the conversation. Although the primary focus was on multi-language support, it turns out that the iterative architecture that was developped for multi-language also provides improvements on single-language conversations. Utterances are considered in short segments and analyzed multiple time until Verbatim has built confidence that the text is accurate.

Language support

Languages supported by openai/whisper using the whisper-large-v3 model should also work, including: Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh

Mixed language support

Speeches may comprise multiple languages. This includes different languages spoken one after the other (ex. two speakers alternating two languages) or multiple languages being mixed, such as the use of English expressions within a French speech.

Speaker Identification

The speech recognition distinguishes between speakers using diarization. Verbatim currently supports pyannote and Senko as optional diarization backends.

On macOS / Apple Silicon, Senko is worth considering for server deployments that reuse warmed models. On the Air France bilingual sample in this repository, warmed Senko runs completed in about 0.15s versus about 3.46s for warmed pyannote on the same machine. Pyannote still produced finer-grained speaker turns on that sample, so the best choice depends on whether your priority is latency or segmentation detail.

Word-Level Confidence

The output provides word-level confidence, with poorly recognized words clearly identified to guide manual editing.

Time Tracking

The output text is associated with timestamps to facilitate source audio navigation when manually editing.

Voice Isolation

Verbatim will work on unclean audio sources, for example where there might be music, keystrokes from keyboards, background noise, etc. Voices are isolated from other sounds using adefossez/demucs.

For audit purposes, the audio that was removed because it was considered background noise is saved so it can be manually reviewed if necessary.

In addition, when the rolling-window VAD skips non-speech spans of roughly 2.5 seconds or more, Verbatim can emit timeline markers to draw attention to those gaps during review. By default these are coarse markers such as [SILENCE] or [ENVIRONMENT NOISE]. When the optional AST non-speech backend is enabled, the same path can emit richer labels such as [MUSIC].

Optional GPU Acceleration (on a 12GB VRAM Budget)

The current objective is to limit the VRAM requirements to 12GB, allowing cards such as NVidia RTX 4070 to accelerate the processing.

Verbatim will also run on CPU, but processing should be expected to be slow.

Long Audio Support (2h+)

The main use case for Verbatim is transcription of meeting. Consequently, it is designed to work with files containing at least 2 hours of audio.

Audio Conversion

A variety of audio formats is support as input, including raw, compressed audio or even video files containing audio tracks. Any format supported by ffmpeg is accepted.

Streaming

Verbatim can also be used with streaming audio. For this purpose, a low-latency mode can be enabled, at the cost of suboptimal quality.

Offline processing

100% offline to ensure confidentiality. The docker image may be executed with --network none to ensure that nothing reaches out.

Output designed for auditing

The output includes

a subtitle track rendered over the original audio to review the results.
a Word document identifying low-confidence words, speaker and timestamps to quickly jump to relevant sections and ensure no part has been omitted

Sample

Consider the following audio file obtained from universal-soundbank including a mixture of French and English:

https://github.com/gaspardpetit/verbatim/assets/9883156/23bc86d2-567e-4be3-8d79-ba625be8c614

First, we extract the background audio and remove it from the analysis:

Background noise:

https://github.com/gaspardpetit/verbatim/assets/9883156/42fad911-3c15-45c2-a40a-7f923fdd4533

Then we perform diarization and language detection. We correctly detect one speaker speaking in French and another one speaking in English:

Speaker 0 | English:

https://github.com/gaspardpetit/verbatim/assets/9883156/cecec5aa-cb09-473e-bf9b-c5fd82352dab

Speaker 1 | French:

https://github.com/gaspardpetit/verbatim/assets/9883156/8074c064-f4d2-4ec4-8fc0-c985f7c276e8

The output consists of a Word document highlighting words with low certainty (low certainty are underlined and highlighted in yellow, while medium certainty are simply underlined):

Microsoft Word Output

A subtitle file is also provided and can be attached to the original audio:

https://github.com/gaspardpetit/verbatim/assets/9883156/9bcc2553-f183-4def-a9c4-bb0c337d4c82

A direct use of whisper on an audio clip like this one results in many errors. Several utterances end up being translated instead of being transcribed, and others are simply unrecognized and missing:

	Naive Whisper Transcription	Verbatim Transcription
✅	Madame, Monsieur, bonjour et bienvenue à bord.	Madame, Monsieur, bonjour et bienvenue à bord.
❌	Bienvenue à bord, Mesdames et Messieurs.	Welcome aboard, ladies and gentlemen.
❌	Pour votre sécurité et votre confort, prenez un moment pour regarder la vidéo de sécurité suivante.	For your safety and comfort, please take a moment to watch the following safety video.
✅	Ce film concerne votre sécurité à bord. Merci de nous accorder votre attention.	Ce film concerne votre sécurité à bord. Merci de nous accorder votre attention.
✅	Chaque fois que ce signal est allumé, vous devez attacher votre ceinture pour votre sécurité.	Chaque fois que ce signal est allumé, vous devez attacher votre ceinture pour votre sécurité.
✅	Nous vous recommandons de la maintenir attachée de façon visible lorsque vous êtes à votre siège.	Nous vous recommandons de la maintenir attachée, de façon visible, lorsque vous êtes à votre siège.
❌	Lorsque le signe de la selle est en place, votre selle doit être assise en sécurité. Pour votre sécurité, nous recommandons que vous gardiez votre selle assise et visible à tous les temps en selle.	Whenever the seatbelt sign is on, your seatbelt must be securely fastened. For your safety, we recommend that you keep your seatbelt fastened and visible at all times while seated.
❌	Pour détacher votre selleure, soulevez la partie supérieure de la boucle.	To release the seatbelt, just lift the buckle.
❌		Pour détacher votre ceinture, soulevez la partie supérieure de la boucle.
✅	Il est strictement interdit de fumer dans l'avion, y compris dans les toilettes.	Il est strictement interdit de fumer dans l'avion, y compris dans les toilettes.
❌		This is a no-smoking flight, and it is strictly prohibited to smoke in the toilets.
✅	En cas de dépressurisation, un masque à oxygène tombera automatiquement à votre portée.	En cas de dépressurisation, un masque à oxygène tombera automatiquement à votre portée.
❌		If there is a sudden decrease in cabin pressure, your oxygen mask will drop automatically in front of you.
✅	Tirez sur le masque pour libérer l'oxygène, placez-le sur votre visage.	Tirer sur le masque pour libérer l'oxygène, placez-le sur votre visage.
❌		Pull the mask toward you to start the flow of oxygen. Place the mask over your nose and mouth. Make sure your own mask is well-adjusted before helping others.
✅	Une fois votre masque ajusté, il vous sera possible d'aider d'autres personnes. En cas d'évacuation, des panneaux lumineux EXIT vous permettent de localiser les issues de secours. Repérez maintenant le panneau EXIT le plus proche de votre siège. Il peut se trouver derrière vous.	Une fois votre masque ajusté, il vous sera possible d'aider d'autres personnes. En cas d'évacuation, des panneaux lumineux EXIT vous permettent de localiser les issues de secours. Repérez maintenant le panneau EXIT le plus proche de votre siège. Il peut se trouver derrière vous.
❌	En cas d'urgence, les signes d'exit illuminés vous aideront à locater les portes d'exit.	In case of an emergency, the illuminated exit signs will help you locate the exit doors.
❌	S'il vous plaît, prenez un moment pour locater l'exit le plus proche de vous. L'exit le plus proche peut être derrière vous.	Please take a moment now to locate the exit nearest you. The nearest exit may be behind you.
❌	Les issues de secours sont situées de chaque côté de la cabine, à l'avant, au centre, à l'arrière. à l'avant, au centre, à l'arrière.	Les issues de secours sont situées de chaque côté de la cabine, à l'avant, au centre, à l'arrière.
❌		Emergency exits on each side of the cabin are located at the front, in the center, and at the rear.
✅	Pour évacuer l'avion, suivez le marquage lumineux.	Pour évacuer l'avion, suivez le marquage lumineux.
❌		In the event of an evacuation, pathway lighting on the floor will guide you to the exits.
✅	Les portes seront ouvertes par l'équipage.	Les portes seront ouvertes par l'équipage.
❌		Doors will be opened by the cabin crew.
✅	Les toboggans se déploient automatiquement.	Les toboggans se déploient automatiquement.
❌		The emergency slides will automatically inflate.
✅	Le gilet de sauvetage est situé sous votre siège ou dans la coudoir centrale.	Le gilet de sauvetage est situé sous votre siège ou dans l'accoudoir central.
❌		Your life jacket is under your seat or in the central armrest.
✅	Passez la tête dans l'encolure, attachez et serrez les sangles.	Passez la tête dans l'encolure, attachez et serrez les sangles.
❌		Place it over your head and pull the straps tightly around your waist. Inflate your life jacket by pulling the red toggles.
✅	Une fois à l'extérieur de l'avion, gonflez votre gilet en tirant sur les poignées rouges.	Une fois à l'extérieur de l'avion, gonflez votre gilet en tirant sur les poignées rouges.
❌	Faites-le seulement quand vous êtes à l'extérieur de l'avion.	Do this only when you are outside the aircraft.
✅	Nous allons bientôt décoller. La tablette doit être rangée et votre dossier redressé.	Nous allons bientôt décoller. La tablette doit être rangée et votre dossier redressé.
❌		In preparation for takeoff, please make sure your tray table is stowed and secure and that your seat back is in the upright position.
✅	L'usage des appareils électroniques est interite pendant le décollage et l'atterrissage.	L'usage des appareils électroniques est interdit pendant le décollage et l'atterrissage.
❌		The use of electronic devices is prohibited during takeoff and landing.
✅	Les téléphones portables doivent rester éteints pendant tout le vol.	Les téléphones portables doivent rester éteints pendant tout le vol.
❌		Mobile phones must remain switched off for the duration of the flight.
✅	Une notice de sécurité placée devant vous est à votre disposition.	Une notice de sécurité placée devant vous est à votre disposition.
❌	Merci encourage everyone to read the safety information leaflet located in the seat back pocket.	We encourage everyone to read the safety information leaflet located in the seat back pocket.
✅	Merci pour votre attention. Nous vous souhaitons un bon vol.	Merci pour votre attention. Nous vous souhaitons un bon vol.
✅	Thank you for your attention. We wish you a very pleasant flight.	Thank you for your attention. We wish you a very pleasant flight.

## Model Cache and Offline Mode

Verbatim can prefetch and reuse models from a deterministic cache directory, and can run 100% offline once the cache is warmed.

--model-cache <dir>: sets a shared cache directory used by Hugging Face and faster-whisper.
--offline: prevents any network access and model downloads. All models must already be present in the cache; otherwise a clear error is raised.
--install: prefetches commonly used models into the selected cache and exits.

Examples

Prefetch models (first run, with network):

HUGGINGFACE_TOKEN=hf_... verbatim --install --model-cache /models

Fully offline run reusing the cache:

verbatim input.mp3 -o out --model-cache /models --offline

Default cache location (when --model-cache is not specified):

Local project directory ./.verbatim/ is used as the root cache.
Subdirectories are created inside: ./.verbatim/hf, ./.verbatim/whisper, etc.
If ./.verbatim/ cannot be created or is not writable, the app gracefully falls back to library defaults.

Voice isolation (MDX): the audio-separator backend loads a checkpoint (e.g., MDX23C-8KFFT-InstVoc_HQ_2.ckpt). For offline use with --model-cache, place the file under <cache>/audio-separator/.

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

gaspardpetit

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.3.5

Apr 19, 2026

1.3.4

Apr 9, 2026

1.3.3

Apr 8, 2026

1.3.2

Mar 25, 2026

1.3.0

Mar 23, 2026

1.2.6

Feb 2, 2026

1.2.5

Feb 2, 2026

1.2.4

Feb 1, 2026

1.2.3

Feb 1, 2026

1.2.2

Jan 31, 2026

1.2.1

Dec 1, 2025

1.2.0

Dec 1, 2025

1.1.0

Feb 14, 2025

1.0.2

Jan 4, 2025

1.0.1

Jan 2, 2025

0.1.6

Feb 2, 2024

0.1.5

Jan 17, 2024

0.1.3

Jan 15, 2024

0.1.2

Jan 14, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

verbatim-1.3.5.tar.gz (128.2 kB view details)

Uploaded Apr 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

verbatim-1.3.5-py3-none-any.whl (163.2 kB view details)

Uploaded Apr 19, 2026 Python 3

File details

Details for the file verbatim-1.3.5.tar.gz.

File metadata

Download URL: verbatim-1.3.5.tar.gz
Upload date: Apr 19, 2026
Size: 128.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for verbatim-1.3.5.tar.gz
Algorithm	Hash digest
SHA256	`a33fdd877bf384a6c812813b036c4518fdeac60c84be51ea40f5d7f7c3f60b80`
MD5	`51416f9a5f747971dd2648d8c81d1ae2`
BLAKE2b-256	`1ef22ce26fbc203f1925047867e3a49af3821afd88bdfe3438f1009b79317e38`

See more details on using hashes here.

Provenance

The following attestation bundles were made for verbatim-1.3.5.tar.gz:

Publisher: python-publish.yml on gaspardpetit/verbatim

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: verbatim-1.3.5.tar.gz
- Subject digest: a33fdd877bf384a6c812813b036c4518fdeac60c84be51ea40f5d7f7c3f60b80
- Sigstore transparency entry: 1340836004
- Sigstore integration time: Apr 19, 2026
Source repository:
- Permalink: gaspardpetit/verbatim@083442eb40294f305371c37781f53b7a6790b334
- Branch / Tag: refs/tags/v1.3.5
- Owner: https://github.com/gaspardpetit
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@083442eb40294f305371c37781f53b7a6790b334
- Trigger Event: release

File details

Details for the file verbatim-1.3.5-py3-none-any.whl.

File metadata

Download URL: verbatim-1.3.5-py3-none-any.whl
Upload date: Apr 19, 2026
Size: 163.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for verbatim-1.3.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f2aafd2ba65de6ffb655255d02e1be8a2c6cd2715db8b01a64d9ab23274be86f`
MD5	`2b295655dedb6abc320500e8cabd8b5e`
BLAKE2b-256	`1cacbcefc4f7500f38fe9589d7f0b58d19d79c3775ed6fbff17a29d042c3706d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for verbatim-1.3.5-py3-none-any.whl:

Publisher: python-publish.yml on gaspardpetit/verbatim

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: verbatim-1.3.5-py3-none-any.whl
- Subject digest: f2aafd2ba65de6ffb655255d02e1be8a2c6cd2715db8b01a64d9ab23274be86f
- Sigstore transparency entry: 1340836008
- Sigstore integration time: Apr 19, 2026
Source repository:
- Permalink: gaspardpetit/verbatim@083442eb40294f305371c37781f53b7a6790b334
- Branch / Tag: refs/tags/v1.3.5
- Owner: https://github.com/gaspardpetit
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@083442eb40294f305371c37781f53b7a6790b334
- Trigger Event: release

verbatim 1.3.5

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Verbatim

Installation

Prerequisites

Portaudio

Installing

Torch with Cuda Support

HuggingFace Token (optional for diarization)

.env

Usage (from terminal)

Usage (from Docker)

Usage (from python)

Contributing

Architecture

Objectives

High Quality

Language support

Mixed language support

Speaker Identification

Word-Level Confidence

Time Tracking

Voice Isolation

Optional GPU Acceleration (on a 12GB VRAM Budget)

Long Audio Support (2h+)

Audio Conversion

Streaming

Offline processing

Output designed for auditing

Sample

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance