Skip to main content

100% offline, Whisper-powered voice notes from your terminal

Project description

hark 😇

PyPI version Python 3.11+ License: AGPL v3 Code style: ruff

100% offline, Whisper-powered voice notes from your terminal

Use Cases

  • Voice-to-LLM pipelineshark | llm turns speech into AI prompts instantly
  • Meeting minutes — Transcribe calls with speaker identification (--diarize)
  • System audio capture — Record what you hear, not just what you say (--input speaker)
  • Private by design — No cloud, no API keys, no data leaves your machine

Features

  • 🎙️ Record - Press space to start, Ctrl+C to stop
  • 🔊 Multi-source - Capture microphone, system audio, or both
  • Transcribe - Powered by faster-whisper
  • 🗣️ Diarize - Identify who said what with WhisperX
  • 🔒 Local - 100% offline, no cloud required
  • 📄 Flexible - Output as plain text, markdown, or SRT subtitles

Installation

pipx install hark-cli

System Dependencies

Ubuntu/Debian:

sudo apt install portaudio19-dev

macOS:

brew install portaudio

Optional: Vulkan Acceleration

For GPU-accelerated transcription via Vulkan (AMD/Intel GPUs):

Ubuntu/Debian:

sudo apt install libvulkan1 vulkan-tools mesa-vulkan-drivers

Then set the device in your config or use --device vulkan.

Quick Start

# Record and print to stdout
hark

# Save to file
hark notes.txt

# Use larger model for better accuracy
hark --model large-v3 meeting.md

# Transcribe in German
hark --lang de notes.txt

# Output as SRT subtitles
hark --format srt captions.srt

# Capture system audio (e.g., online meetings)
hark --input speaker meeting.txt

# Capture both microphone and system audio (stereo: L=mic, R=speaker)
hark --input both conversation.txt

Configuration

Hark uses a YAML config file at ~/.config/hark/config.yaml. CLI flags override config file settings.

# ~/.config/hark/config.yaml
recording:
  sample_rate: 16000
  channels: 1 # Use 2 for --input both
  max_duration: 600
  input_source: mic # mic, speaker, or both

whisper:
  model: base # tiny, base, small, medium, large, large-v2, large-v3
  language: auto # auto, en, de, fr, es, ...
  device: auto # auto, cpu, cuda, vulkan

preprocessing:
  noise_reduction:
    enabled: true
    strength: 0.5 # 0.0-1.0
  normalization:
    enabled: true
  silence_trimming:
    enabled: true

output:
  format: plain # plain, markdown, srt
  timestamps: false

diarization:
  hf_token: null # HuggingFace token (required for --diarize)
  local_speaker_name: null # Your name in stereo mode, or null for SPEAKER_00

Audio Input Sources

Hark supports three input modes via --input or recording.input_source:

Mode Description
mic Microphone only (default)
speaker System audio only (loopback capture)
both Microphone + system audio as stereo (L=mic, R=speaker)

System Audio Capture (Linux)

System audio capture uses PulseAudio/PipeWire monitor sources. To verify your system supports it:

pactl list sources | grep -i monitor

You should see output like:

Name: alsa_output.pci-0000_00_1f.3.analog-stereo.monitor
Description: Monitor of Built-in Audio

Speaker Diarization

Identify who said what in multi-speaker recordings using WhisperX.

Setup

  1. Install diarization dependencies:

    pipx inject hark-cli whisperx
    # Or with pip:
    pip install hark-cli[diarization]
    
  2. Get a HuggingFace token (required for pyannote models):

  3. Add token to config:

    # ~/.config/hark/config.yaml
    diarization:
      hf_token: "hf_xxxxxxxxxxxxx"
    

Usage

The --diarize flag enables speaker identification. It requires --input speaker or --input both.

# Transcribe a meeting with speaker identification
hark --diarize --input speaker meeting.txt

# Specify expected number of speakers (improves accuracy)
hark --diarize --speakers 3 --input speaker meeting.md

# Skip interactive speaker naming for batch processing
hark --diarize --no-interactive --input speaker meeting.txt

# Stereo mode: separate local user from remote speakers
hark --diarize --input both conversation.md

# Combine with other options
hark --diarize --input speaker --format markdown --model large-v3 meeting.md
Flag Description
--diarize Enable speaker identification
--speakers N Hint for expected speaker count (improves clustering)
--no-interactive Skip post-transcription speaker naming prompt

Note: Diarization adds processing time. For a 5-minute recording, expect ~1-2 minutes on GPU or ~5-10 minutes on CPU.

Output Format

With diarization enabled, output includes speaker labels and timestamps:

Plain text:

[00:02] [SPEAKER_01] Hello everyone, let's get started.
[00:05] [SPEAKER_02] Thanks for joining. Let me share my screen.

Markdown:

# Meeting Transcript

**SPEAKER_01** (00:02)
Hello everyone, let's get started.

**SPEAKER_02** (00:05)
Thanks for joining. Let me share my screen.

---

_2 speakers detected • Duration: 5:23 • Language: en (98% confidence)_

Interactive Naming

After transcription, hark will prompt you to identify speakers:

Detected 2 speaker(s) to identify.

SPEAKER_01 said: "Hello everyone, let's get started."
Who is this? [name/skip/done]: Alice

SPEAKER_02 said: "Thanks for joining. Let me share my screen."
Who is this? [name/skip/done]: Bob

Use --no-interactive to skip this prompt.

Known Issues

Slow diarization? The pyannote models may default to CPU inference. For GPU acceleration:

pip install --force-reinstall onnxruntime-gpu

See WhisperX #499 for details.

Development

git clone https://github.com/FPurchess/hark.git
cd hark
uv sync --extra test
uv run pre-commit install
uv run pytest

License

AGPLv3

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hark_cli-0.2.0.tar.gz (326.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hark_cli-0.2.0-py3-none-any.whl (66.0 kB view details)

Uploaded Python 3

File details

Details for the file hark_cli-0.2.0.tar.gz.

File metadata

  • Download URL: hark_cli-0.2.0.tar.gz
  • Upload date:
  • Size: 326.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.2 cpython/3.12.3 HTTPX/0.28.1

File hashes

Hashes for hark_cli-0.2.0.tar.gz
Algorithm Hash digest
SHA256 e8f61351d641fc66c99a10fa3dfd3be48aee1175f12c4512d57f01622f18056f
MD5 97d6d6825e862600bfcb7c87bfd6ef23
BLAKE2b-256 d53f837ba87eb58c5f2f3aa25293bb79047805a0553b4814c7b83fe0d89ef8d2

See more details on using hashes here.

File details

Details for the file hark_cli-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: hark_cli-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 66.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.2 cpython/3.12.3 HTTPX/0.28.1

File hashes

Hashes for hark_cli-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7f67dfaed273cc7562c155849b7cba1c330319a93ea346530c3b84a0a8f46314
MD5 459071e043df76965bdde3bc15b2c6e3
BLAKE2b-256 0fb31ca20a96cb00af95355d58e8517946639d28ebbbb7468322a7c5e23790de

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page