Fully local meeting transcription with speaker diarization, AI summaries, and PDF output

These details have not been verified by PyPI

Project links

Project description

meetscribe

Fully local meeting transcription with speaker diarization, AI-generated summaries, and professional PDF output.

Records dual-channel audio (your mic + system audio) from any meeting app and produces diarized transcripts using WhisperX + pyannote-audio. Everything runs on your machine -- no cloud APIs, no data leaves your computer.

Works with any meeting app

Because meetscribe captures system audio at the OS level, it works with every voice/video call application:

Zoom
Google Meet
Microsoft Teams
Slack (huddles and calls)
Discord
Signal (voice and video calls)
Telegram (voice and video calls)
WhatsApp (desktop voice and video calls)
Keet (P2P calls)
Jitsi Meet
Webex
Skype
FaceTime (via browser)
GoTo Meeting
RingCentral
Amazon Chime
BlueJeans

Any app that plays audio through your system speakers will work -- including browser-based meetings and standalone desktop clients.

Features

Dual-channel audio capture -- records your mic (left channel) and remote participants (right channel) simultaneously via PipeWire/PulseAudio + ffmpeg
WhisperX transcription -- fast batched inference with openai/whisper-large-v3-turbo, word-level timestamps via wav2vec2 alignment
Multilingual -- auto-detects language or manually set it; supports English, German, Turkish, French, Spanish, Farsi, and 90+ other languages
Speaker diarization -- pyannote-audio identifies who said what, with automatic YOU/REMOTE labeling from the dual-channel signal
AI meeting summaries -- local LLMs via Ollama extract key topics, action items, decisions, and follow-ups
Professional PDF output -- summary + full transcript in a clean, page-numbered PDF with full Unicode support (DejaVu Sans) and RTL for Farsi
Multiple output formats -- .txt, .srt, .json, .summary.md, .pdf
GTK3 GUI widget -- small always-on-top window with record/stop, timer, and one-click access to results
CLI -- meet record, meet transcribe, meet run, meet gui, meet devices, meet check
Per-session folders -- each recording gets its own organized directory
Offline -- after initial model download, everything works without internet

Quick start

# Install from PyPI
pip install meetscribe-offline

# Set your HuggingFace token (required for speaker diarization)
export HF_TOKEN=hf_your_token_here

# Record a meeting, then auto-transcribe + summarize when you stop
meet run
# Press Ctrl+C when the meeting ends

Requirements

Linux with PipeWire or PulseAudio
NVIDIA GPU with CUDA (8GB+ VRAM recommended; CPU mode available but slower)
Python 3.10+
ffmpeg
HuggingFace token (free) for the diarization model
Ollama (optional) for AI meeting summaries

See REQUIREMENTS.md for full hardware/software details.

Installation

1. System dependencies

# Ubuntu / Pop!_OS / Debian
sudo apt install ffmpeg pulseaudio-utils

# Fedora
sudo dnf install ffmpeg pulseaudio-utils

2. Install meetscribe

# From PyPI (recommended)
pip install meetscribe-offline

# From source
git clone https://github.com/pretyflaco/meetscribe
cd meetscribe
pip install -e .

This creates the meet command in your PATH.

3. HuggingFace token (for speaker diarization)

Create a free account at https://huggingface.co
Accept the model terms at https://huggingface.co/pyannote/speaker-diarization-community-1
Create a read token at https://huggingface.co/settings/tokens
Set it:

export HF_TOKEN=hf_your_token_here
# Add to ~/.bashrc for persistence:
echo 'export HF_TOKEN=hf_your_token_here' >> ~/.bashrc

4. Ollama (optional, for AI summaries)

Install from https://ollama.com, then pull the default summary model:

ollama pull qwen3.5:9b

5. Verify setup

meet check

Usage

Check audio devices

meet devices

Record a meeting

Start recording before or during your meeting:

meet record

Press Ctrl+C when the meeting ends. A 10-second drain buffer ensures all audio is captured. Recordings are saved to ~/meet-recordings/.

Options:

-o /path -- save recordings elsewhere
--virtual-sink -- create isolated virtual sink (avoids capturing notification sounds)
--mic <source> -- specify mic source (use meet devices to find names)
--monitor <source> -- specify monitor source

Transcribe a recording

meet transcribe ~/meet-recordings/meeting-20260312-140000/meeting-20260312-140000.wav

Options:

-m large-v3-turbo -- Whisper model (default: large-v3-turbo; also: base, medium, large-v2)
-l auto -- language code or auto to auto-detect (default: auto; e.g. en, de, tr, fa)
--device cuda -- cuda or cpu (default: cuda)
--compute-type float16 -- float16 or int8 for lower VRAM (default: float16)
-b 16 -- batch size, reduce if running low on VRAM (default: 16)
--min-speakers 2 / --max-speakers 6 -- hint for number of speakers
--no-diarize -- skip speaker diarization
--no-summarize -- skip AI summary generation
--summary-model <model> -- Ollama model for summary (default: qwen3.5:9b)

Record + transcribe in one shot

meet run

Records until Ctrl+C, then automatically transcribes, generates a summary, and produces a PDF. Takes all options from both record and transcribe.

Launch the GUI widget

meet gui

A small always-on-top window with:

Record / Stop button
Live timer and file size
Status indicator (Recording, Flushing, Transcribing, Summarizing, Done)
"Open PDF" and "Open Folder" buttons after completion

When 2 or more speakers are detected, a speaker labeling dialog appears before the results are saved. Each speaker is shown with their channel and a sample line of text. Enter a real name or leave blank to keep the auto-assigned label (YOU, REMOTE_1, etc.).

meetscribe GUI

Label speakers after the fact

meet label ~/meet-recordings/meeting-20260313-214133

For each speaker in the recording, meet label:

Shows a table of all speakers (label, channel, segment count, sample text)
Plays a short audio clip from that speaker's channel (requires ffplay)
Prompts you to enter a real name (press Enter to keep the existing label)
Regenerates all outputs (.txt, .srt, .json, .summary.md, .pdf) with the new names

Options:

--no-audio -- skip audio playback, just show text samples
--no-summary -- use find-and-replace instead of re-running Ollama

Output

Each recording gets its own session directory:

~/meet-recordings/meeting-20260312-140000/
    meeting-20260312-140000.wav            # Stereo audio (16kHz)
    meeting-20260312-140000.session.json   # Recording metadata
    meeting-20260312-140000.ffmpeg.log     # ffmpeg capture log
    meeting-20260312-140000.txt            # Plain text transcript
    meeting-20260312-140000.srt            # Subtitle format
    meeting-20260312-140000.json           # Full detail (word-level timestamps)
    meeting-20260312-140000.summary.md     # AI meeting summary (Markdown)
    meeting-20260312-140000.pdf            # Professional PDF (summary + transcript)

Example .txt output:

[00:00:12 --> 00:00:18] YOU: So the main issue we're seeing is with the API rate limiting.
[00:00:19 --> 00:00:25] REMOTE_1: Right, I think we should implement exponential backoff.
[00:00:26 --> 00:00:31] YOU: Agreed. Can you also look at caching the responses?

AI summary

When Ollama is running, meetscribe generates a structured meeting summary with:

Overview
Key topics discussed
Action items (with owners when mentioned)
Decisions made
Open questions / follow-ups

Supported models

Model	Size	Speed	Notes
`qwen3.5:9b`	6.6 GB	~18-35s	Default -- best balance of quality and speed
`gemma3:12b`	8.1 GB	~15s	Fastest
`qwen3:14b`	9.3 GB	~39s	Good quality
`glm-4.7-flash`	19 GB	~37s	Must use thinking-off mode (handled automatically)

Change the model:

meet run --summary-model gemma3:12b

Disable summaries:

meet run --no-summarize

Multilingual support

meetscribe auto-detects the spoken language by default (Whisper large-v3-turbo supports 99 languages). You can also set it explicitly:

meet run --language de       # German
meet run --language tr       # Turkish
meet run --language fr       # French
meet run --language es       # Spanish
meet run --language fa       # Farsi (Persian)
meet run --language auto     # Auto-detect (default)

How it works

Transcription: The same Whisper model handles all languages -- no extra download or VRAM cost. When set to auto, the detected language is used for alignment and all downstream steps.
Speaker diarization: Completely language-agnostic (based on voice characteristics, not speech content).
AI summary: When a non-English language is detected, the summary prompt instructs the LLM to write the summary in the same language as the transcript.
PDF output: Uses DejaVu Sans for full Unicode coverage (Latin, Cyrillic, Greek, Turkish special characters, etc.). Farsi uses Noto Naskh Arabic with RTL text reshaping.

Tested languages

Language	Code	Alignment model	PDF font	Notes
English	`en`	wav2vec2 (torchaudio)	DejaVu Sans
German	`de`	VoxPopuli (torchaudio)	DejaVu Sans
French	`fr`	VoxPopuli (torchaudio)	DejaVu Sans
Spanish	`es`	VoxPopuli (torchaudio)	DejaVu Sans
Turkish	`tr`	wav2vec2 (HuggingFace)	DejaVu Sans	~1.2 GB alignment model download
Farsi	`fa`	wav2vec2 (HuggingFace)	Noto Naskh Arabic	~1.2 GB alignment model download, RTL

Farsi RTL requirements

Farsi uses right-to-left text. For proper PDF rendering, install the optional RTL dependencies:

pip install arabic-reshaper python-bidi
# Or with the optional extra:
pip install "meetscribe-offline[rtl]"

Without these libraries, Farsi text will appear in the PDF but glyphs may not be joined correctly and reading order may be wrong.

Virtual sink mode

By default, meet record captures all system audio (including notification sounds, music, etc.). For cleaner recordings, use --virtual-sink:

meet record --virtual-sink

This creates an isolated audio sink. Route your meeting app's audio to it:

Open pavucontrol (PulseAudio Volume Control)
Go to the "Playback" tab
Find your browser or meeting app
Change its output to "Meet-Capture"

You'll still hear the meeting through your normal speakers via automatic loopback.

VRAM usage

With an NVIDIA GPU (12 GB VRAM):

Model	Transcription	+ Diarization	Recommended batch_size
large-v3-turbo	~4 GB	~7 GB total	16
medium	~3 GB	~6 GB total	16
base	~1 GB	~4 GB total	16

If you hit OOM errors:

Reduce --batch-size to 4 or 8
Use --compute-type int8
Use a smaller model (--model medium or --model base)
Use --device cpu as a last resort

How it works

[Meeting App] --> [PipeWire/PulseAudio] --> [ffmpeg dual-channel capture] --> meeting.wav
                                                                                  |
                  [WhisperX: faster-whisper + wav2vec2 alignment + pyannote diarization]
                                                                                  |
                                      [Ollama LLM summary]     [Diarized transcript]
                                              |                         |
                                        .summary.md          .txt / .srt / .json
                                              |                         |
                                              +--------> .pdf <---------+

Capture: Records your mic (left channel) and system audio (right channel) simultaneously into a single stereo WAV file at 16 kHz.

Transcribe: Runs the WhisperX pipeline -- batched Whisper transcription, wav2vec2 forced alignment for word-level timestamps, and pyannote speaker diarization. Dual-channel energy analysis maps speakers to YOU or REMOTE.

Summarize: Sends the transcript to a local Ollama model that extracts a structured summary.

PDF: Combines the summary and full transcript into a professional page-numbered PDF document.

CUDA NVRTC note

The pyannote diarization model requires CUDA NVRTC for JIT compilation. If your CUDA driver version doesn't match the installed libnvrtc-builtins version, meetscribe automatically creates a compatibility symlink. This happens transparently on first use.

If you still see NVRTC errors:

export LD_LIBRARY_PATH=$HOME/.local/lib/cuda:$LD_LIBRARY_PATH

Limitations

Overlapping speech is not handled well (Whisper limitation)
Speaker labels default to role-based (YOU, REMOTE_1, REMOTE_2) — use meet label or the GUI dialog to assign real names
Diarization accuracy varies with audio quality and number of speakers
Linux only (PulseAudio/PipeWire dependency)

Contributing

git clone https://github.com/pretyflaco/meetscribe
cd meetscribe
pip install -e .[dev]
/usr/bin/python3 -m pytest tests/

Pull requests welcome. Please run the test suite before submitting.

Changelog

See CHANGELOG.md for release history.

License

GPL-3.0

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.1

Apr 13, 2026

0.4.0

Apr 13, 2026

0.3.3

Apr 12, 2026

0.3.2

Apr 10, 2026

This version

0.2.0

Mar 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

meetscribe_offline-0.2.0.tar.gz (77.5 kB view details)

Uploaded Mar 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

meetscribe_offline-0.2.0-py3-none-any.whl (70.9 kB view details)

Uploaded Mar 14, 2026 Python 3

File details

Details for the file meetscribe_offline-0.2.0.tar.gz.

File metadata

Download URL: meetscribe_offline-0.2.0.tar.gz
Upload date: Mar 14, 2026
Size: 77.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for meetscribe_offline-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`cad90baef63a543a465e9f6fbd1c150403a2146a768139acab100468a7505d8f`
MD5	`2c43804a60ba18cabeccda1a14ea0882`
BLAKE2b-256	`c0093f4ddb50d89364f94702b2c3d67651d39103c7cbc1a9ee23dc688159a047`

See more details on using hashes here.

File details

Details for the file meetscribe_offline-0.2.0-py3-none-any.whl.

File metadata

Download URL: meetscribe_offline-0.2.0-py3-none-any.whl
Upload date: Mar 14, 2026
Size: 70.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for meetscribe_offline-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b38b9ac03c598f612501cfcfc89c6b1755b193437f616b328f94954783d306bd`
MD5	`8781f9f1dee8d25172169cd779232520`
BLAKE2b-256	`1ce9ecb5a79e0d1deeeb1276173090bf1b209d411610510d3be5cc43cc1d6ada`

See more details on using hashes here.

meetscribe-offline 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

meetscribe

Works with any meeting app

Features

Quick start

Requirements

Installation

1. System dependencies

2. Install meetscribe

3. HuggingFace token (for speaker diarization)

4. Ollama (optional, for AI summaries)

5. Verify setup

Usage

Check audio devices

Record a meeting

Transcribe a recording

Record + transcribe in one shot

Launch the GUI widget

Label speakers after the fact

Output

AI summary

Supported models

Multilingual support

How it works

Tested languages

Farsi RTL requirements

Virtual sink mode

VRAM usage

How it works

CUDA NVRTC note

Limitations

Contributing

Changelog

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes