Skip to main content

Local-first, AI-queryable meeting archive. Ingests from meeting tools, applies personal vocabulary corrections, and stores in portable markdown.

Project description

meeting-hive

CI License: MIT Python 3.11+ Code style: ruff

A local-first pipeline for correcting speech-to-text errors in meeting transcripts, using a vocabulary you maintain.

Each meeting becomes a normalized pair of markdown files on your disk. Any tool with filesystem access — macOS Spotlight, grep, Obsidian, or whatever AI assistant you happen to use — can search, cross-reference, and reason across the whole hive.

Runs on macOS, Linux, and Windows via an adapter architecture: you pick which meeting tool feeds the pipeline and which vocabulary source corrects its transcripts. Swap either independently. Add new adapters without touching the core.

Platform support: macOS is first-class (dedicated installer under scripts/macos/). Linux and Windows are documented with manual setup recipes — see docs/scheduling.md. Dedicated installers for both are tracked as good first issues.


Why this exists

This is a stopgap. A reasonably well-built one, but still a stopgap.

Meeting-recording and speech-to-text tools have come a long way: live transcription, AI summaries, cross-meeting search, MCP integrations with Claude / ChatGPT / Cursor. For most people, the tool they already use does the whole job.

The specific gap this project addresses: proper-noun accuracy in multi-language meetings.

Speech-to-text engines handle proper nouns poorly by default. Company names, product names, tech jargon, people's names rarely sit in training data, and accent variation makes it worse. The common fix across tools is a custom vocabulary feature: you tell the engine "when you hear 'Post Grass', write 'Postgres'". Where it breaks, consistently across the market:

If your meetings mix languages (technical calls across Portuguese / English / Spanish, for example — with teams spread across different accents), none of those paths gives you a vocabulary that actually applies to your day-to-day. Same ~50 terms — company names, tools, platforms — coming out wrong, consistently, meeting after meeting. Waiting for vendors to resolve the trade-off is possible but open-ended.

meeting-hive is the workaround: pull meetings onto disk, apply a vocabulary you maintain (language-agnostic, because it lives outside the source tool), re-summarize against the corrected text, and leave plain markdown any tool can read.

This is not a replacement for your meeting tool. Source tools own recording, live transcription, live summarization, cross-meeting search, MCP integrations — things this project doesn't do and shouldn't try. meeting-hive addresses one narrow gap and nothing more.

The goal is for this project to retire. When source tools ship multi-language mode with user-editable custom vocabulary, this becomes redundant and should die. It's open source on the off-chance someone else has the same specific pain. It's not trying to grow.

A side benefit: the archive self-organizes

Once meetings are on disk, meeting-hive also tags each one with a category (client: Acme, vendor: Stripe, internal: founders, …) drawn from rules you maintain in config.yaml.

The classifier doesn't guess. If no rule matches a meeting, the meeting isn't written — it's surfaced in the daily notification with a copy-paste YAML snippet suggesting where to add the rule (by domain, title pattern, or specific email). You decide the canonical name and category; the next run picks up the meeting. A 7-day lookback guarantees nothing is lost, only deferred.

The rule list grows incrementally as you work:

  • Zero placeholders. The archive never fills up with UNKNOWN / TBD entries that nobody ever goes back to fix.
  • No silent misclassification. Either a human-defined rule matched, or nothing got written. Always auditable.
  • A living map of your professional graph. Over weeks, config.yaml becomes an accurate directory of the clients, vendors, teams, and recurring meetings you actually deal with. Any other tool or script can read that same file for its own purposes.

This falls out of the architecture regardless of whether the vocabulary problem gets solved upstream — so if meeting-hive ever does retire, the classification scheme is portable to whatever replaces it.

The name

The name borrows from the beehive metaphor, because each piece maps cleanly to what the pipeline does:

  • A cell — one meeting. Paired markdown files (transcript + summary, both with frontmatter) stored on disk.
  • The hive — your whole archive. All cells organized by scope and year, readable as plain text.
  • The bees — whatever works the hive. Spotlight, grep, Obsidian, any AI agent with filesystem access — they forage across cells to extract patterns, find prior decisions, surface context you didn't know to ask for.
  • The flower — the source tool that produces the raw meeting. It's where the nectar (audio + speech-to-text) is generated. The bees never visit the flower directly; they only work the hive.

meeting-hive is the step between the flower and the hive: harvest the meeting from the flower, clean up the data with the vocabulary, store the cell. That's the whole job.

How it works

The pipeline has three adapter axes:

┌──────────────────┐   ┌──────────────────┐   ┌──────────────────┐
│ Source adapter   │   │ Vocabulary       │   │ Summarizer       │
│ (where meetings  │   │ adapter          │   │ adapter          │
│  come from)      │   │ (how to spell    │   │ (which AI        │
└────────┬─────────┘   │  things)         │   │  regenerates)    │
         │             └────────┬─────────┘   └────────┬─────────┘
         │                      │                      │
         ▼                      ▼                      ▼
    ┌──────────────────── meeting-hive ────────────────────┐
    │  classify → correct transcript → re-summarize →       │
    │  write paired markdown                                 │
    └────────────────────────────┬─────────────────────────┘
                                 │
                                 ▼
                   ~/.meeting-notes/<scope>/<year>/
                      YYYY-MM-DD-slug-summary.md
                      YYYY-MM-DD-slug-transcript.md
                                 │
                                 ▼
                   Any tool with filesystem access
                   (Spotlight, grep, Obsidian, any AI, etc.)

Built-in adapters:

Axis Adapter Notes
Source granola Granola local cache + REST fallback. macOS today; Windows when Granola ships a Windows client.
Source fathom Fathom via its public REST API. Cross-platform. Requires FATHOM_API_KEY.
Source markdown Generic — reads meetings from a directory of YAML-frontmatter .md files. Any tool that exports markdown. Cross-platform.
Vocabulary wispr Read-only wrapper over the Wispr Flow dictation dictionary (macOS / Windows). Use if you already maintain a vocab there.
Vocabulary sqlite Local SQLite DB managed by meeting-hive. Cross-platform, mutable via meeting-hive vocab CLI.
Summarizer anthropic Claude family (Sonnet / Opus / Haiku) via the Anthropic API — or any Anthropic-compatible endpoint via base_url.
Summarizer openai GPT / o-series via OpenAI — or any OpenAI-compatible server via base_url (LM Studio, llama.cpp, Jan.ai, vLLM, LocalAI, OpenRouter, LiteLLM).
Summarizer ollama Local LLM via Ollama. No API key; requires the server running locally.

Adding a new adapter is a single file + one line in the built-in registry. Adapters are config-driven, so the core pipeline doesn't know which one is in use.

Prerequisites

  • Python 3.11+
  • A summarizer backend — an Anthropic or OpenAI API key, OR a local Ollama server with at least one model pulled
  • A source — one of the supported adapters set up (Granola signed in, a Fathom API key, or any tool that can write markdown into a directory)
  • A vocabulary source (optional) — leave sqlite as the default and populate it via CLI, or point to Wispr Flow

Install

macOS (launchd)

git clone https://github.com/plribeiro3000/meeting-hive.git ~/Projects/meeting-hive
cd ~/Projects/meeting-hive
./scripts/macos/install.sh --summarizer anthropic   # or openai / ollama

The installer:

  • Creates a Python venv inside the repo (.venv/)
  • Symlinks the meeting-hive CLI into ~/bin/
  • Asks for adapter choices (summarizer, source, vocabulary), scope, and internal email domains
  • Runs meeting-hive init with your answers to generate config.yaml at the standard path (first run only)
  • Prompts for the summarizer's API key if it needs one (Anthropic / OpenAI) and writes it to secrets.env (chmod 600). Ollama needs no key.
  • Renders the launchd plist and loads it

Everything the installer creates is idempotent. Re-running it is safe.

Unattended install (for AI agents / CI)

The installer works non-interactively if every decision is passed as a flag. Stdin not being a TTY is auto-detected; missing --summarizer becomes a hard error (everything else falls back to sensible defaults). API keys are never requested programmatically — pass --skip-secrets and the installer prints the exact command the user needs to run afterwards.

./scripts/macos/install.sh \
    --summarizer anthropic \
    --source granola \
    --vocabulary sqlite \
    --scope work \
    --internal-domains mycompany.com,myco.co.uk \
    --skip-secrets

After the install finishes, the user (not the agent) runs:

echo 'ANTHROPIC_API_KEY=sk-...' >> ~/.config/meeting-hive/secrets.env
chmod 600 ~/.config/meeting-hive/secrets.env

Verify with a dry-run that doesn't hit the network or LLM:

meeting-hive sync --since 1 --dry-run --verbose

./scripts/macos/install.sh --help lists every flag.

Schedule

Default: Monday-Friday at 00:00 local time. Override via installer flags:

./scripts/macos/install.sh --hour 4 --minute 30 --days 1-5
./scripts/macos/install.sh --hour 3 --minute 0 --days 0,2,4,6   # Sun/Tue/Thu/Sat
./scripts/macos/install.sh --days 0-6                           # every day

Weekdays follow launchd's convention: 0 or 7 = Sunday, 1 = Monday, ..., 6 = Saturday. Re-running the installer with different flags regenerates the plist and reloads the agent.

Uninstall

./scripts/macos/uninstall.sh              # removes launchd agent, plist, and CLI symlink
./scripts/macos/uninstall.sh --purge      # also removes config, vocabulary DB, logs
./scripts/macos/uninstall.sh --purge --nuke-notes  # also removes ~/.meeting-notes/ (data loss)

--nuke-notes requires you to type the archive path to confirm (or pass --i-really-mean-it in non-TTY mode). The meeting archive is user data — the script won't delete it by accident. See uninstall.sh --help for every flag.

Linux & Windows

No dedicated installer yet. Install the package manually.

Linux:

git clone https://github.com/plribeiro3000/meeting-hive.git
cd meeting-hive
python3 -m venv .venv
.venv/bin/pip install -e .
ln -s "$PWD/bin/meeting-hive" ~/.local/bin/meeting-hive   # or wherever your PATH points

Windows (PowerShell):

git clone https://github.com/plribeiro3000/meeting-hive.git
cd meeting-hive
python -m venv .venv
.venv\Scripts\pip install -e .
# Add <repo>\bin to PATH, or copy bin\meeting-hive.cmd into a PATH dir.

The bin\meeting-hive.cmd wrapper is the Windows equivalent of the Linux/macOS bash shim — it loads secrets.env, bootstraps the venv on first run if needed, and invokes the Python module with your arguments.

Generate config.yaml with interactive prompts:

meeting-hive init

Or pass everything non-interactively:

meeting-hive init --summarizer ollama --source granola --vocabulary sqlite \
  --internal-domains mycompany.com

If your summarizer needs an API key, add it to the secrets file:

  • Linux: $XDG_CONFIG_HOME/meeting-hive/secrets.env (or ~/.config/meeting-hive/secrets.env)
  • Windows: %APPDATA%\meeting-hive\secrets.env
ANTHROPIC_API_KEY=sk-...      # or OPENAI_API_KEY=...

Then schedule meeting-hive sync — see docs/scheduling.md for systemd / cron / Task Scheduler recipes.

To uninstall on Linux / Windows there's no dedicated script yet. Manual steps:

Linux (systemd user timer):

systemctl --user disable --now meeting-hive.timer
rm ~/.config/systemd/user/meeting-hive.{service,timer}
rm ~/.local/bin/meeting-hive
rm -rf ~/.config/meeting-hive ~/.local/share/meeting-hive ~/.local/state/meeting-hive
# ~/.meeting-notes/ is your archive — delete separately only if you really want to.

Windows (Task Scheduler):

Unregister-ScheduledTask -TaskName "meeting-hive sync" -Confirm:$false
Remove-Item $env:APPDATA\meeting-hive -Recurse
Remove-Item $env:LOCALAPPDATA\meeting-hive -Recurse
# %USERPROFILE%\.meeting-notes is your archive — delete separately only if you really want to.

CLI

# Run the ingestion pipeline.
meeting-hive sync --since 7
meeting-hive sync --since 1 --dry-run --verbose

# Manage the vocabulary (when `vocabulary.adapter: sqlite`).
meeting-hive vocab list
meeting-hive vocab add "Post Grass" "Postgres"
meeting-hive vocab remove "Post Grass"
meeting-hive vocab clear

# Bulk operations.
meeting-hive vocab import wispr              # seed from Wispr Flow
meeting-hive vocab import yaml file.yaml     # seed from a YAML file
meeting-hive vocab export yaml backup.yaml   # dump to YAML

If your configured vocabulary adapter is read-only (e.g., wispr), mutating commands print a clear error telling you where to edit entries.

Configuration

After install, edit your config to pick adapters and teach the classifier about your entities:

source:
  adapter: granola        # or: markdown
  config: {}

vocabulary:
  adapter: sqlite         # or: wispr
  config: {}

summarizer:
  adapter: anthropic      # or: openai | ollama (no default — set by installer)
  config: {}

internal_domains:
  - mycompany.com

domain_rules:
  acme.com:
    type: client
    entity: Acme

title_patterns:
  - match: "(?i)founders friday"
    type: internal
    entity: founders

See docs/configuration.md for the full schema including all adapter config keys.

Daily operation

  • Record meetings as usual in your source tool.
  • Next run, a desktop notification appears:
    • ✅ "3 meeting(s) synced." — nothing for you to do
    • ⚠️ "2 ok, 1 need classification: Call with New Prospect" — copy the suggested YAML snippet from the log into config.yaml. Next run picks it up.
    • ❌ "config.yaml invalid (line 47)" / "source auth expired" — go fix and the next run recovers.
  • Manual run anytime: meeting-hive sync --since 1 --verbose
  • Dry-run (no writes, no notifications): meeting-hive sync --dry-run --verbose

Local history (git)

On macOS, the scheduled run uses a wrapper (meeting-hive-autocommit) that commits any archive changes to a local-only git repository under ~/.meeting-notes/.git/ after every sync. Empty syncs produce no commit. Nothing is pushed anywhere — the history lives on your disk.

What this gives you:

  • Daily diff: cd ~/.meeting-notes && git log --oneline shows one entry per sync with changes, git show <commit> displays exactly what got written that day.
  • Rollback: git reset --hard <commit> reverts the archive to any prior sync. If a run writes something wrong, you can undo it without touching the source tool.
  • Audit: every change to every file is attributable to a specific sync, so "when did this line appear?" is always answerable via git blame.

Linux and Windows users can replicate this with a small shell / PowerShell wrapper around meeting-hive sync — see docs/scheduling.md.

This is versioning, not backup. The history lives on the same disk as the archive. If the disk dies, both are gone. See below.

Backup

meeting-hive does not back up your archive, and neither does the local git history described above. If you care about your meeting notes surviving disk failure, theft, ransomware, or accidental deletion, you are responsible for configuring a backup mechanism. The markdown files in ~/.meeting-notes/ are plain text; any standard tool works:

  • Time Machine (macOS) — set it up with an external drive and make sure ~/.meeting-notes/ is not excluded.
  • Backblaze — continuous off-site backup of your whole Mac; versioned; the archive is covered automatically.
  • restic / borg / kopia — client-side encrypted incremental backups to any backend (local disk, S3, Backblaze B2, SFTP). You hold the encryption key; the storage provider only sees opaque blobs.
  • Cloud sync (iCloud Drive, Dropbox, OneDrive) — convenient, but note two caveats: (a) macOS only syncs folders placed inside the provider's sync root, which would require moving the archive out of $HOME; (b) sync is not backup — if the pipeline writes a corrupted file, it propagates to every synced device.

Pick one and set it up before your first unattended run.

What's NOT in scope

  • Being a replacement for your meeting-recording tool. This project doesn't record, transcribe, summarize live, or offer cross-meeting search as a service. That's what your source tool is for.
  • Writing back to the source tool. Source tool transcripts are read-only from our side; vocabulary is yours to maintain. meeting-hive is one-way: source → local archive.
  • Re-processing already-written files. If you add a new vocabulary rule today, old files don't get re-corrected. Forward-only.
  • Growing beyond fixing transcriptions. Features that don't directly serve the "correct proper nouns in multi-language transcripts" goal won't be added.

Contributing

Narrow-scope project with an explicit goal of retiring once source tools ship multi-language vocabulary. If you have the same specific pain — multi-language technical meetings where per-language vocab doesn't apply — contributions are welcome.

Good PR fits:

  • Adapters for source tools I don't use (Meetily, Hyprnote, OpenWhispr, Zoom AI Companion, Fathom, Otter, …)
  • Adapters for vocabulary tools beyond Wispr Flow (Handy, Voxtype, Nerd Dictation, …)
  • Dedicated installers for Linux (systemd user service + timer) and Windows (Task Scheduler)

Adding an adapter is a single file in meeting_hive/sources/ or meeting_hive/vocabs/ implementing the Protocol, plus one line in the module's _BUILTINS registry. No other code changes needed.

If your workflow already works with what exists — you don't need this project. That's fine; you're not the audience this is for.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

meeting_hive-1.1.0.tar.gz (57.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

meeting_hive-1.1.0-py3-none-any.whl (50.2 kB view details)

Uploaded Python 3

File details

Details for the file meeting_hive-1.1.0.tar.gz.

File metadata

  • Download URL: meeting_hive-1.1.0.tar.gz
  • Upload date:
  • Size: 57.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for meeting_hive-1.1.0.tar.gz
Algorithm Hash digest
SHA256 82b1235c8c977d81a39db95ec5c4ab0f9cc88325babd2f6b916a76aa04cf62d3
MD5 6158050fdb4604e6f793705b9d22e712
BLAKE2b-256 1bd51d155e5bd77f8efc2d00e5b64bc3120abedbcd0920ea3c81d33300a2b22c

See more details on using hashes here.

Provenance

The following attestation bundles were made for meeting_hive-1.1.0.tar.gz:

Publisher: release.yml on plribeiro3000/meeting-hive

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file meeting_hive-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: meeting_hive-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 50.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for meeting_hive-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 421af392f217c84437fad634f90d22c84b7cdabdc29e35b839a50307f425e99a
MD5 9213b04725f682197534a9c00a8e11db
BLAKE2b-256 410f28049ae1103b99d95453c795d245ec76bc9ccc54a39a2f3d36214cec805a

See more details on using hashes here.

Provenance

The following attestation bundles were made for meeting_hive-1.1.0-py3-none-any.whl:

Publisher: release.yml on plribeiro3000/meeting-hive

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page