Skip to main content

Real-time AI speech-to-text for meetings with GPT-4o Transcribe and GPU speaker diarization

Project description

Donate for PatrikX3 / P3X Contact Corifeus / P3X Corifeus @ Facebook Uptime ratio (90 days)

🎙️🧠 Real-time AI speech-to-text for meetings with GPT-4o Transcribe and GPU speaker diarization v2026.4.120

🌌 Bugs are evident™ - MATRIX️
🚧 This project is under active development!
📢 We welcome your feedback and contributions.

NodeJS LTS is supported

🛠️ Built on NodeJs version

v24.14.1

📝 Description

Meet Assistant

Real-time AI speech-to-text for meetings and conversations. Captures speaker audio, transcribes it live using OpenAI GPT-4o Transcribe, and auto-labels each utterance by voice (Speaker 1, Speaker 2, ...). Supports 10 European languages out of the box.

PyPI Python Downloads engine diarization license

Links

Quickstart — install from PyPI

pip install p3x-meet-assistant
export OPENAI_API_KEY=sk-...
p3x-meet-assistant

Open http://localhost:8088. That's the whole thing.

The bundled wheel ships the full web UI — no Node.js, no git clone, no build step needed at runtime.

Package on PyPI: https://pypi.org/project/p3x-meet-assistant/

What you get

  • Live transcription via OpenAI GPT-4o Transcribe — the highest-accuracy speech model available today
  • Auto speaker diarization — colored Speaker 1 / 2 / 3 ... labels based on voice fingerprint
  • 10 languages: English, Hungarian, German, French, Spanish, Italian, Portuguese, Dutch, Polish, Czech
  • Browser-based UI — Dark / Light theme, adjustable font size, one-click transcript export
  • Captures system audio on Linux (PulseAudio / PipeWire) or any audio in a browser tab via MIC / TAB buttons
  • Distributed as a pip-installable Python packagep3x-meet-assistant on PyPI

Why one language at a time?

Auto-detect mode (trying two languages and picking the best) produces far more hallucinations than explicitly selecting a single language. Pick the language you're actually hearing, and accuracy jumps dramatically.

Platforms — works on Linux, macOS, and Windows

Meet Assistant runs on any OS with Python 3.10+ and a modern browser. The only feature that's Linux-specific is the optional server-side system-audio capture. Everywhere else you use the browser's built-in audio capture — same APIs Google Meet uses.

Platform Install Microphone System / tab audio GPU diarization
Linux (desktop) pip install 'p3x-meet-assistant[linux-capture]' + sudo apt install portaudio19-dev MIC button ✓ Auto-captured via PulseAudio / PipeWire (plus TAB button) ✓ NVIDIA CUDA
macOS pip install p3x-meet-assistant MIC button TAB button (Chrome/Edge; share a tab with audio) ✓ CPU or Apple Silicon eGPU
Windows pip install p3x-meet-assistant MIC button TAB button ✓ NVIDIA CUDA or CPU
Cloud server pip install p3x-meet-assistant — (no local audio) Browser capture from the user's machine Optional CPU diarization

macOS specifics

  • The standard pip install p3x-meet-assistant works. No homebrew needed for the default setup.
  • For meetings in Google Meet / Zoom / Teams, use the TAB button in the browser — it works identically to how you'd share audio in a Meet call.
  • To capture system audio outside a browser tab (e.g. a desktop Zoom app), install BlackHole or Loopback to create a virtual audio device, then select it as the browser's microphone input.
  • For GPU speaker diarization on Apple Silicon, the [gpu] extra installs torch; it runs on the Metal backend automatically.

Windows specifics

  • The standard pip install p3x-meet-assistant works on Windows 10/11.
  • Open PowerShell or Command Prompt and run p3x-meet-assistant.
  • TAB button captures any browser-tab audio (the same permission flow as Meet's "Share a tab" with the "Share audio" checkbox).
  • If you want system-wide capture, tools like VB-Audio Cable or Voicemeeter expose a virtual microphone that routes all system audio into the browser.
  • NVIDIA GPU diarization works out of the box via the [gpu] extra.

Requirements — bare minimum

  • Python 3.10+
  • A modern browser (Chrome, Firefox, Edge)
  • An OpenAI API key

Node.js is not required when installing from PyPI — the wheel ships the pre-built frontend.

Requirements — optional extras

  • Linux system-audio capture: portaudio19-dev package + the [linux-capture] pip extra
  • Speaker diarization: any NVIDIA GPU with ~500 MB VRAM (GTX 1650 / RTX 2060 and up) + the [gpu] pip extra. CPU fallback works but is slower.

No GPU is fine — the app degrades gracefully. You lose speaker labels but everything else works.

Install from PyPI

The recommended path for anyone who just wants to use Meet Assistant. The Python wheel bundles the pre-built frontend, so there's no Node.js, no build step, no git clone — just pip install and go.

1. One-time setup — create a virtual environment

Skip straight to step 2 if you already use a venv or a managed environment like pipx, poetry, or uv.

python3 -m venv ~/.venvs/meet-assistant
source ~/.venvs/meet-assistant/bin/activate

Installing into the system Python works too, but a venv keeps dependencies isolated. On some modern Linux distros (Ubuntu 24.04+, Debian 12+) system-wide pip install is blocked by PEP 668 — a venv (or pipx) is required.

2. Install the package

Pick the variant that matches your hardware. All four commands install the same core package; the optional extras pull in additional wheels for features you want.

Command What you get Wheel size Recommended for
pip install p3x-meet-assistant Cloud transcription (GPT-4o) + browser audio capture ~300 kB + deps (~40 MB) Laptops, macOS/Windows, cloud servers
pip install 'p3x-meet-assistant[gpu]' Above + GPU speaker diarization (resemblyzer + torch) ~700 MB total Workstations with any NVIDIA GPU
pip install 'p3x-meet-assistant[linux-capture]' Above + server-side PulseAudio / PipeWire capture (SpeechRecognition + PyAudio) ~40 MB + system portaudio Linux desktops that want system-audio capture
pip install 'p3x-meet-assistant[all]' Everything together ~700 MB Full local workstation install

Linux users with [linux-capture] or [all] need the PortAudio dev headers before pip install:

sudo apt install portaudio19-dev

3. Provide your OpenAI API key

Get a key at https://platform.openai.com/api-keys, then either:

Option A — environment variable (quickest):

export OPENAI_API_KEY=sk-...

Add it to ~/.bashrc / ~/.zshrc if you want it permanent.

Option B — .env file in your working directory:

cd ~/my-meetings                  # wherever you run the command from
echo "OPENAI_API_KEY=sk-..." > .env

Meet Assistant automatically loads .env from the current working directory on startup.

4. Run it

p3x-meet-assistant

Open http://localhost:8088 in your browser. Pick a language from the top dropdown, then either:

  • Click MIC to transcribe your microphone
  • Click TAB to share a browser tab with audio (Google Meet, YouTube, a Facebook stream — anything with "Share audio" enabled)
  • On Linux with [linux-capture] installed, the server auto-detects the system speaker monitor and starts transcribing immediately

Every transcript is appended to sessions/YYYY-MM-DD-HH-MM.txt in your working directory automatically.

Upgrade or uninstall

pip install --upgrade p3x-meet-assistant            # latest stable
pip install 'p3x-meet-assistant==2026.4.109'        # pin to a specific release
pip uninstall p3x-meet-assistant                    # remove

Release notes for every version: https://github.com/patrikx3/meet-assistant/releases.

What gets installed

The wheel contains:

  • meet_assistant/ — the Python package (FastAPI server, OpenAI client, diarizer, state manager)
  • meet_assistant/dist/ — the pre-built Vite frontend (HTML, JS, CSS, Font Awesome fonts)
  • Entry point p3x-meet-assistantmeet_assistant.cli:main

What's not in the wheel (excluded by MANIFEST.in): secure/, agents/, .claude/, .vscode/, AGENTS.md, CLAUDE.md, source-only configs, the dev launcher, and any tokens. Safe to install from PyPI.

Troubleshooting a pip install

Symptom Fix
externally-managed-environment (PEP 668) Use a venv (python3 -m venv) or pipx install p3x-meet-assistant
Could not build wheels for PyAudio on Linux Install portaudio19-dev: sudo apt install portaudio19-dev
Could not build wheels for PyAudio on macOS brew install portaudio then retry
No module named 'torch' at runtime Install the [gpu] extra or skip diarization
Port 8088 already in use Run with p3x-meet-assistant --port 9000 (or any free port)
No OpenAI API key found Set OPENAI_API_KEY in your shell or .env in the working directory

Install from source (development workflow)

Only needed if you want to hack on the code itself.

git clone https://github.com/patrikx3/meet-assistant.git
cd meet-assistant

# Linux only — for PulseAudio capture
sudo apt install portaudio19-dev

# Python venv
python3 -m venv venv

# Pick ONE based on your hardware:
./venv/bin/pip install -r requirements.txt        # full local with GPU
./venv/bin/pip install -r requirements-cloud.txt  # cloud-only, no GPU

# Frontend build
yarn install
yarn build:web

# Dev launcher (auto-reload)
./meet-assistant-web.py --dev

Want diarization later on an already-installed source checkout? Just add:

./venv/bin/pip install resemblyzer

API key

You need one OpenAI API key. Get it at https://platform.openai.com/api-keys.

Option A — .env file (recommended)

cp .env.example .env
# Edit .env and paste your key:  OPENAI_API_KEY=sk-...

The .env file is gitignored and auto-loaded on startup.

Option B — shell environment variable

export OPENAI_API_KEY=sk-...
./meet-assistant-web.py

For a permanent setup, add the export line to ~/.bashrc or ~/.zshrc.

Run

./meet-assistant-web.py

Open http://localhost:8088 in your browser. Pick a language from the dropdown, start talking (or open a Google Meet), watch the transcript flow in.

Command-line options

Flag Default Description
--port PORT 8088 Web server port
--host HOST 0.0.0.0 Web server host
--dev off Auto-reload on Python file changes
--device INDEX 11 PyAudio device index for the speaker monitor (Linux only)

Speaker diarization

If the diarizer loaded successfully (check the startup console output), every transcribed line is prefixed with a Speaker N label, color-coded in the UI. Clusters live in memory for the session — click the Clear button to wipe them and start fresh.

  • Runs on GPU (CUDA) automatically, falls back to CPU if no GPU is available
  • Adds ~20 ms per chunk on a modern NVIDIA card — imperceptible
  • Language-independent (voice fingerprint, not words)
  • Tuning knob: SIMILARITY_THRESHOLD in web/diarizer.py. Lower = more merging, higher = more splitting. Default: 0.75.

Troubleshooting the clustering:

  • Same person gets split across multiple speakers → lower the threshold to ~0.65
  • Different people collapse into one speaker → raise the threshold to ~0.82

Session auto-save

Every transcript is appended to sessions/YYYY-MM-DD-HH-MM.txt as it arrives — nothing to click. A new file is started each time you hit Clear, or each time you restart the server. The folder is gitignored by default.

Rolling prompt context

Each transcription call passes the last ~400 characters of the session as its OpenAI prompt hint. This keeps proper nouns, jargon, and acronyms spelled consistently across chunks instead of being re-invented every 5 seconds. Costs nothing extra.

How audio is captured

Linux with PulseAudio / PipeWire (default on a local workstation): The app auto-detects your speakers' monitor source and records everything that plays on them — meeting audio, video calls, YouTube, Facebook streams, anything audible. No routing setup needed.

Any OS (or cloud deployment): Click the MIC button to transcribe your microphone, or the TAB button to share a browser tab with audio (identical to Google Meet's "Share tab with audio" feature). Uses the standard getUserMedia / getDisplayMedia browser APIs.

Use with Google Meet / Zoom / Teams / Facebook Live

Start Meet Assistant, then join your call or open your stream as normal. On Linux, system audio is captured automatically. On other platforms, click TAB and select the meeting tab with "Share audio" enabled.

Development

# Frontend dev server with HMR (port 5173, proxies /ws to :8088)
npm run dev

# Production build
npm run build:web

# Backend with auto-reload on file changes
./meet-assistant-web.py --dev

VS Code: open the project and press F5 — preset launch configs are wired up.

Project structure

meet-assistant-web.py         # Entry point
web/
  __init__.py                 # Package init, .env auto-load, audio bootstrap
  audio.py                    # PulseAudio source detection
  engines.py                  # OpenAI GPT-4o Transcribe wrapper + hallucination filter
  diarizer.py                 # Speaker diarization (resemblyzer on CUDA)
  state.py                    # App state, WebSocket broadcast, capture loop
  server.py                   # FastAPI app, routes, WebSocket handler
  src/                        # Frontend source (Vite)
    index.html
    main.js
    style.css
  dist/                       # Built frontend (gitignored)
requirements.txt              # Full deps with GPU diarization
requirements-cloud.txt        # Lean deps, cloud-only (no diarization)
.env.example                  # Template for your API key

Troubleshooting

Symptom Fix
No OpenAI API key found Set OPENAI_API_KEY in .env or export it in your shell
No monitor source found You're not on PulseAudio/PipeWire — use the MIC or TAB browser buttons
Diarizer unavailable Install resemblyzer: ./venv/bin/pip install resemblyzer — or ignore if you don't want speaker labels
One person tagged as multiple speakers Lower SIMILARITY_THRESHOLD in web/diarizer.py to ~0.65
Multiple people collapsed into one speaker Raise SIMILARITY_THRESHOLD to ~0.82
Too many hallucinations on silent audio Already filtered — see _is_hallucination in web/engines.py

License

MIT


Corifeus Network

AI-powered network & email toolkit — free, no signup.

Web · network.corifeus.com MCP · npm i -g p3x-network-mcp

  • AI Network Assistant — ask in plain language, get a full domain health report
  • Network Audit — DNS, SSL, security headers, DNSBL, BGP, IPv6, geolocation in one call
  • Diagnostics — DNS lookup & global propagation, WHOIS, reverse DNS, HTTP check, my-IP
  • Mail Tester — live SPF/DKIM/DMARC + spam score + AI fix suggestions, results emailed (localized)
  • Monitoring — TCP / HTTP / Ping with alerts and public status pages
  • MCP server — 17 tools exposed to Claude Code, Codex, Cursor, any MCP client
  • Installclaude mcp add p3x-network -- npx p3x-network-mcp
  • Try"audit example.com", "why do my emails land in spam? test me@example.com"
  • Sourcepatrikx3/network · patrikx3/network-mcp
  • Contactpatrikx3.com · donate

❤️ Support Our Open-Source Project

If you appreciate our work, consider ⭐ starring this repository or 💰 making a donation to support server maintenance and ongoing development. Your support means the world to us—thank you!


🌍 About My Domains

All my domains, including patrikx3.com, corifeus.eu, and corifeus.com, are developed in my spare time. While you may encounter minor errors, the sites are generally stable and fully functional.


📈 Versioning Policy

Version Structure: We follow a Major.Minor.Patch versioning scheme:

  • Major: 📅 Corresponds to the current year.
  • Minor: 🌓 Set as 4 for releases from January to June, and 10 for July to December.
  • Patch: 🔧 Incremental, updated with each build.

🚨 Important Changes: Any breaking changes are prominently noted in the readme to keep you informed.

P3X-MEET-ASSISTANT Build v2026.4.120

Donate for PatrikX3 / P3X Contact Corifeus / P3X Like Corifeus @ Facebook

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

p3x_meet_assistant-2026.4.120.tar.gz (305.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

p3x_meet_assistant-2026.4.120-py3-none-any.whl (297.2 kB view details)

Uploaded Python 3

File details

Details for the file p3x_meet_assistant-2026.4.120.tar.gz.

File metadata

  • Download URL: p3x_meet_assistant-2026.4.120.tar.gz
  • Upload date:
  • Size: 305.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for p3x_meet_assistant-2026.4.120.tar.gz
Algorithm Hash digest
SHA256 33e914fc6b160d0277d00a585d1172216d251449f4689cd9c6d95bfed2c1cb6c
MD5 9fe21df95cf6607af0948f2d43a5f7a5
BLAKE2b-256 4d27dbd390ee2870e9a84f0088bc775b3ce7a5e87a79d74f152cdebdfb1e5a3d

See more details on using hashes here.

File details

Details for the file p3x_meet_assistant-2026.4.120-py3-none-any.whl.

File metadata

File hashes

Hashes for p3x_meet_assistant-2026.4.120-py3-none-any.whl
Algorithm Hash digest
SHA256 1ec1541f24376cf373cc6f0ee3e1ee0ba7fcd73bef547dff40fd2a99c5104f52
MD5 3267772e918e82eb589e5b071f9fc64f
BLAKE2b-256 e7c54fbec8cc799353a0bd631c9bc33cc042485d60f3bc5150e39f6dd92569ba

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page