Ambient voice intelligence for AI agents — give your agent ears

These details have not been verified by PyPI

Project links

Project description

◉ Percept

Give your AI agent ears.

Open-source ambient voice intelligence for AI agents

Quick Start • Getting Started • API • Architecture • CLI • Protocol

MIT License Python 3.10+ OpenClaw Compatible NVIDIA Inception

🎙️ Ambient Voice Pipeline

https://github.com/GetPercept/percept/raw/main/demo.mp4

🤖 MCP Integration — Claude Desktop

https://github.com/GetPercept/percept/raw/main/demo-mcp.mov

Percept is an open-source ambient voice pipeline that connects wearable microphones to AI agents. Wear a pendant, speak naturally, and your agent executes voice commands, summarizes meetings, identifies speakers, and builds a searchable knowledge graph — all processed locally on your machine.

What makes Percept different: It's not just transcription. The Context Intelligence Layer (CIL) transforms raw speech into structured, actionable context — entity extraction, relationship graphs, speaker resolution, and semantic search — so your agent actually understands what's being said.

Quick Start

# Install
pip install getpercept

# Start the server (receiver on :8900, dashboard on :8960)
percept serve

# Point your Omi webhook to:
#   https://your-host:8900/webhook/transcript

Say "Hey Jarvis, remind me to check email" and watch it work.

✨ Features

Voice Pipeline

🎙️ Wake Word Detection — "Hey Jarvis" (configurable via DB settings) triggers voice commands
⚡ 7 Action Types — Email, text, reminders, search, calendar, notes, orders — by voice
📝 Auto Summaries — Meeting summaries sent via iMessage after 60s of silence
🗣️ Speaker Identification — Say "that was Sarah" to teach it who's talking
👂 Ambient Logging — Full transcript history with timestamps and speaker labels
🔒 Local-First — faster-whisper runs on your machine. Audio never leaves your hardware

Context Intelligence Layer (CIL)

🧠 Entity Extraction — Two-pass pipeline: fast regex + LLM semantic extraction
🔗 Relationship Graph — Auto-builds entity relationships (mentioned_with, works_on, client_of)
🎯 Entity Resolution — 5-tier cascade: exact → fuzzy → contextual → recency → semantic
🔍 Semantic Search — NVIDIA NIM embeddings + LanceDB vector store
💾 SQLite Persistence — Conversations, utterances, speakers, contacts, actions, relationships
📊 FTS5 Full-Text Search — Porter-stemmed search across all utterances
⏰ TTL Auto-Purge — Configurable retention: utterances 30d, summaries 90d, relationships 180d

Security

🔐 Speaker Authorization — Allowlist of authorized speakers. Only approved voices trigger commands
🔑 Webhook Authentication — Bearer token or URL token (?token=) on all webhook endpoints
📋 Security Audit Log — All blocked attempts logged with timestamp, speaker, transcript snippet, and reason
🛡️ Injection Resistant — Intent parser rejects prompt injection attempts (credential exfiltration, system commands, data leaks)
🏠 Local-First — Audio and transcripts never leave your machine. No cloud dependency

Intent Parser

🏎️ Two-Tier Hybrid — Fast regex (handles ~80% of commands instantly) + LLM fallback
🔢 Spoken Number Support — "thirty minutes" → 1800s, "an hour and a half" → 5400s
📇 Contact Resolution — "email Sarah" auto-resolves from contacts registry
💬 Spoken Email Normalization — "jane at example dot com" → jane@example.com

Architecture

  Mic (Omi Pendant / Apple Watch)
        │ BLE
  Phone App (streams audio)
        │ Webhook
  Percept Receiver (FastAPI, port 8900)
   ├─ Webhook authentication (Bearer token / URL token)
   ├─ Speaker authorization gate (allowlist check)
   ├─ Wake word detection (from DB settings)
   ├─ Intent parser (regex + LLM, injection-resistant)
   ├─ Conversation segmentation (3s command / 60s summary)
   ├─ Entity extraction + relationship graph
   ├─ SQLite persistence (conversations, utterances, speakers, actions)
   ├─ LanceDB vector indexing (NVIDIA NIM embeddings)
   ├─ Security audit log (blocked attempts)
   └─ Action dispatch → OpenClaw / stdout / webhook
        │
  Dashboard (port 8960)
   ├─ Live transcript feed
   ├─ Conversation history + search
   ├─ Analytics (words/day, speakers, actions)
   ├─ Settings management (wake words, contacts, speakers)
   └─ Data export + purge

Supported Hardware

Device	Status	Notes
Omi Pendant	✅ Live	Primary device. BLE to phone, all-day battery. "Critical to our story"
Apple Watch	🔜 Beta	WatchOS app built (push-to-talk, raise-to-speak). Needs real device testing
AirPods	🔜 Planned	Via phone mic passthrough
Any Webhook Source	✅ Ready	Standard HTTP webhook interface — any device that POSTs transcripts

Supported Actions

Action	Voice Example	Resolution
Email	"Hey Jarvis, email Sarah about the meeting"	Contact lookup → email
Text	"Hey Jarvis, text Rob I'm running late"	Contact lookup → phone
Reminder	"Hey Jarvis, remind me in thirty minutes to call the dentist"	Spoken number parsing
Search	"Hey Jarvis, look up the weather in Cape Town"	Web search
Note	"Hey Jarvis, remember the API key is in the shared doc"	Context capture
Calendar	"Hey Jarvis, schedule a call with Mike tomorrow at 2pm"	Calendar integration
Summary	"Hey Jarvis, summarize this conversation"	On-demand summary

CLI Quick Reference

percept serve                  # Start receiver + dashboard
percept listen                 # Start receiver, output JSON events
percept status                 # Pipeline health check
percept transcripts            # List recent transcripts
percept transcripts --today    # Today's transcripts only
percept actions                # List recent voice actions
percept search "budget"        # Semantic search over conversations
percept audit                  # Data stats (conversations, utterances, storage)
percept purge --older-than 90  # Delete old data
percept config                 # Show configuration
percept config --set whisper.model_size=small
percept speakers list          # Show authorized + known speakers
percept speakers authorize SPEAKER_0  # Authorize a speaker
percept speakers revoke SPEAKER_0     # Revoke a speaker
percept config set webhook_secret <token>  # Set webhook auth token
percept security-log           # View blocked attempts

See CLI Reference for full details.

MCP Server (Claude Desktop / Anthropic Ecosystem)

Percept exposes all capabilities as MCP (Model Context Protocol) tools, so Claude can natively search your conversations, check transcripts, and more.

# Start MCP server (stdio transport)
percept mcp

Claude Desktop Configuration

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

{
  "mcpServers": {
    "percept": {
      "command": "/path/to/percept/.venv/bin/python",
      "args": ["/path/to/percept/run_mcp.py"]
    }
  }
}

Restart Claude Desktop after editing. The Percept tools will appear automatically.

Available MCP Tools

Tool	Description
`percept_search`	Full-text search across conversations
`percept_transcripts`	List recent transcripts
`percept_actions`	Voice command history
`percept_speakers`	Known speakers with word counts
`percept_status`	Pipeline health check
`percept_security_log`	Blocked attempts log
`percept_conversations`	Conversations with summaries
`percept_listen`	Live transcript stream

MCP Resources

percept://status — Current pipeline status
percept://speakers — Known speakers list

Dashboard

The web dashboard runs on port 8960 and provides:

Live transcript feed — real-time stream of what's being said
Conversation history — searchable archive with speaker labels
Analytics — words/day, segments/hour, speaker breakdown, action history
Settings page — manage wake words, speakers, contacts, transcriber config from DB
Entity graph — browse extracted entities and relationships
Search — FTS5 keyword search with LanceDB vector search fallback
Data management — export all data as JSON, purge by TTL or manually

Transcription

Transcriber	Status	Use Case
Omi on-device	✅ Default	Omi app transcribes locally, sends text via webhook
faster-whisper	✅ Built	Local transcription for raw audio (base model, int8, M-series optimized)
NVIDIA Parakeet	✅ Tested	NVIDIA NIM ASR via gRPC. Superior accuracy, requires API key
Deepgram	🔜 Planned	Cloud ASR option

Three-tier strategy: Local (faster-whisper) → NVIDIA (Parakeet NIM) → Cloud (Deepgram)

Data Model (SQLite)

Table	Purpose	Records
`conversations`	Full conversation records with transcripts, summaries	Core
`utterances`	Atomic speech units (FTS5 indexed, porter stemming)	CIL atomic unit
`speakers`	Speaker profiles with word counts, relationships	Identity
`contacts`	Name → email/phone lookup with aliases	Resolution
`actions`	Voice command history with status tracking	Audit
`entity_mentions`	Entity occurrences per conversation	CIL extraction
`relationships`	Weighted entity graph (source, target, type, evidence)	CIL knowledge
`authorized_speakers`	Speaker allowlist for command authorization	Security
`security_log`	Blocked attempts (unauthorized, invalid auth, injection)	Security
`settings`	Runtime config (wake words, timeouts, transcriber)	Config

Percept Protocol

The Percept Protocol defines a framework-agnostic JSON schema for voice→intent→action handoff:

6 event types: transcript, conversation, intent, action_request, action_response, summary
3 transports: JSON Lines on stdout, WebSocket, Webhook
Unix composable: percept listen | jq 'select(.type == "intent")' | my-agent

📖 Documentation

Doc	Description
Getting Started	Install, configure Omi, first voice command
Configuration	Config file, wake words, transcriber, CIL settings, environment variables
CLI Reference	Every command, every flag, with examples
API Reference	Webhook endpoints, dashboard API, request/response formats
Architecture	Pipeline diagram, CIL design, data flow, extending Percept
Percept Protocol	JSON event protocol for agent integration
OpenClaw Integration	Using Percept with OpenClaw
Decisions	Architecture Decision Records — what we chose and why
Roadmap	Current status and what's next
Contributing	Dev setup, PR guidelines, good first issues

Built for OpenClaw

Percept is designed as a first-class OpenClaw skill, but works standalone with any agent framework — LangChain, CrewAI, AutoGen, or a simple webhook.

# With OpenClaw
openclaw skill install percept

# Without OpenClaw — pipe events anywhere
percept listen --format json | your-agent-consumer

Five skill components: percept-listen, percept-voice-cmd, percept-summarize, percept-speaker-id, percept-ambient

See OpenClaw Integration for details.

Project Structure

percept/
├── src/
│   ├── receiver.py        # FastAPI server, webhooks, wake word, action dispatch
│   ├── transcriber.py     # faster-whisper transcription, conversation tracking
│   ├── intent_parser.py   # Two-tier intent parser (regex + LLM fallback)
│   ├── database.py        # SQLite persistence (11 tables, FTS5, WAL mode)
│   ├── context_engine.py  # CIL: Context packet assembly, entity resolution
│   ├── entity_extractor.py # CIL: Two-pass entity extraction + relationship building
│   ├── vector_store.py    # NVIDIA NIM embeddings + LanceDB semantic search
│   ├── context.py         # Context extraction, conversation file saving
│   └── cli.py             # CLI entry point (9 commands)
├── config/config.json     # Server, whisper, audio settings
├── data/
│   ├── percept.db         # SQLite database (WAL mode)
│   ├── vectors/           # LanceDB vector store
│   ├── conversations/     # Conversation markdown files
│   ├── summaries/         # Auto-generated summaries
│   ├── speakers.json      # Speaker ID → name mapping
│   └── contacts.json      # Contact registry
├── dashboard/
│   ├── server.py          # Dashboard FastAPI backend (port 8960)
│   └── index.html         # Dashboard web UI
├── protocol/
│   ├── PROTOCOL.md        # Event protocol specification
│   └── schemas/           # JSON Schema for 6 event types
├── landing/               # getpercept.ai landing page (port 8950)
├── watch-app/             # Apple Watch app (push-to-talk, raise-to-speak)
├── scripts/               # Utility scripts (backfill, vector indexing)
├── research/              # Research notes (OpenHome, Zuna BCI, etc.)
└── docs/                  # Full documentation

Contributing

We'd love your help:

⭐ Star the repo — helps more than you think
🧪 Try it — install, use it for a day, file issues
🔧 Build — language packs, hardware integrations, new action types
📣 Share — blog about it, tweet about it

See Contributing Guide for dev setup and PR guidelines.

License

MIT — do whatever you want with it.

"Fei-Fei Li gave AI eyes with ImageNet. We're giving AI agents ears."

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.2

Mar 5, 2026

0.1.1

Feb 25, 2026

This version

0.1.0

Feb 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

getpercept-0.1.0.tar.gz (83.6 kB view details)

Uploaded Feb 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

getpercept-0.1.0-py3-none-any.whl (77.8 kB view details)

Uploaded Feb 25, 2026 Python 3

File details

Details for the file getpercept-0.1.0.tar.gz.

File metadata

Download URL: getpercept-0.1.0.tar.gz
Upload date: Feb 25, 2026
Size: 83.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for getpercept-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`5d6cacd9f08367cd42749b28b16cacad342bf9b63dd43eb3807d6771d2fd8938`
MD5	`b5203a0c0bf65d4a0e8c1b8224a92b9b`
BLAKE2b-256	`4b1d7026cea92bc0fac9499c0d1a5516626aff1dc445d3658573cdd116a3715c`

See more details on using hashes here.

File details

Details for the file getpercept-0.1.0-py3-none-any.whl.

File metadata

Download URL: getpercept-0.1.0-py3-none-any.whl
Upload date: Feb 25, 2026
Size: 77.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for getpercept-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3a1f3a1ae60046e8b5be9958248cbb77fa778bb2d03f8439a0c8783a0d17f695`
MD5	`9f264769af8d4b738d9d2045f2f24e9c`
BLAKE2b-256	`dc3661d806ffe7e4b66e7abdfaf58696e5786ad6e2a42c99f5a6e3a077068d5b`

See more details on using hashes here.

getpercept 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

◉ Percept

🎙️ Ambient Voice Pipeline

🤖 MCP Integration — Claude Desktop

Quick Start

✨ Features

Voice Pipeline

Context Intelligence Layer (CIL)

Security

Intent Parser

Architecture

Supported Hardware

Supported Actions

CLI Quick Reference

MCP Server (Claude Desktop / Anthropic Ecosystem)

Claude Desktop Configuration

Available MCP Tools

MCP Resources

Dashboard

Transcription

Data Model (SQLite)

Percept Protocol

📖 Documentation

Built for OpenClaw

Project Structure

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes