Research preview — the open-source cognition layer for goal-driven, proactive vision agents.

These details have not been verified by PyPI

Project description

percept

The open-source cognition layer for goal-driven, proactive vision agents.

You state a goal in plain language — "nudge me when I drink a coffee", "tell me when the kettle boils" — and percept turns a live audio-visual stream into an agent that:

reasons over entities across time — tracked things with stable ids, attributes carrying provenance (observed vs you-told-me) and freshness;
fires proactively, but only on the rising edge of a condition becoming true;
refuses to guess — a three-state gate maps known → act · not → silent · unknown → ask, so it never delivers a confidently-wrong nag.

The wedge is temporal cognition (entity memory + the three-state gate + reasoning over events), which a raw VLM-in-a-loop and a per-frame pipeline both lack. percept builds on frontier models for perception behind vendor-neutral seams.

⚠️ Research preview — v0.1.0. Published for real-life testing and feedback, not for production. The cognition core (gate, entity graph, executor, events, scheduler, consent) runs and is tested; the benchmark card below is a v0.5 DRAFT. APIs may change between 0.1.x releases — pin a version. Issues and feedback welcome. (Status.)

The envelope (refusals, stated proudly)

Assistant-class, never the safety mechanism. percept informs a human who stays responsible. It is never the thing standing between a person and harm on a clock it can't guarantee. Out: driver-drowsiness intervention, turn detection. In: a post-hoc driving debrief.
Watches the user's own world, with the user as beneficiary — never a non-consenting third party to the wearer's advantage. Out: covert analysis of someone you're negotiating with; card-counting against a live casino. In: a play-money practice trainer.

These refusals are a feature. Do not lead with surveillance demos.

Install

pip install percept-vision        # core: PURE STDLIB — runs offline with fake backends, no keys

The core has zero dependencies and runs with deterministic fake backends, so pip install then run works with no API keys. Frontier backends are opt-in extras:

pip install "percept-vision[gemini]"     # GeminiVision
pip install "percept-vision[claude]"     # AnthropicVision (Claude)
pip install "percept-vision[deepgram]"   # Deepgram STT + TTS (voice)
pip install "percept-vision[audio]"      # numpy fast-path for acoustic/edge
pip install "percept-vision[all]"        # everything

Python >= 3.10. The package name is percept-vision; the import is import percept.

60-second quickstart (no keys)

Fully offline — fake backends, deterministic, nothing to configure. This script runs as-is:

import asyncio
from percept import Percept, Goal

async def main():
    # Fake backends by default — offline, no keys. discover_plugins=False skips plugin lookup.
    agent = Percept.create(discover_plugins=False)

    agent.add_goal(Goal(
        id="caffeine",
        condition="the user is drinking coffee",
        say="Heads up — stepping back from caffeine?",
    ))

    # Each frame is judged; the gate fires ONCE on the rising edge.
    # ("sip-coffee" is a token the fake vision backend scripts as a confident YES.)
    fires = await agent.perceive_judged("sip-coffee")
    for ev in fires:                       # ev is a FireEvent
        print(ev.action, ev.goal_id, ev.text)   # -> fire caffeine Heads up — stepping back from caffeine?

    # The same frame again does NOT re-fire — rising-edge, not level-triggered.
    print(await agent.perceive_judged("sip-coffee"))   # -> []

asyncio.run(main())

perceive_judged(frame) returns a list of FireEvent(goal_id, action, text, entity_id, verdict), where action is "fire" or "ask". With real eyes, swap the fake for a frontier vision backend — the cognition layer above is unchanged:

agent = Percept.create(vision="gemini")          # needs percept-vision[gemini] + GEMINI_API_KEY
# frame is now real image bytes; everything downstream (gate, graph, executor) is identical.

Backends are selected by name ("fake" · "gemini" · "claude") or by passing an adapter instance, and can also be set via env (PERCEPT_VISION_BACKEND, etc.).

Architecture — two layers

percept cleanly splits the eyes from the brain. Perception is one stateless seam; everything stateful and proactive is cognition.

┌───────────────────────── COGNITION — the "brain" ──────────────────────────┐
│  three-state GATE       fire (known-yes) · ask (unknown) · silent (known-no) │
│      ▲ rising edge: fires only on false→true, with a refractory (no nag)      │
│  EXECUTOR               one firing path (sense · key · accumulate);           │
│                         transitions A→B, counting, deadlines/absence, verify  │
│  ENTITY GRAPH           stable ids; attributes with provenance + freshness    │
│                         (observed vs you-told-me)                             │
│  EDGE DETECTOR REGISTRY opt-in cheap signals propose; the brain counts/gates  │
│  general · deterministic · cheap · STATEFUL (has memory)                      │
└──────────────────────────────────────────────────────────────────────────────┘
                          ▲  feeds on  Verdict{satisfied, confidence}
┌───────────────────────── PERCEPTION — the "eyes" ──────────────────────────┐
│  vision.judge(condition, frame) -> Verdict        ★ ONE seam                  │
│  GeminiVision / AnthropicVision · stateless · one frame · ~1s/call cloud      │
└──────────────────────────────────────────────────────────────────────────────┘

The gate turns a noisy verdict stream into at most one alert on the rising edge, and falls back to ASK rather than guess when confidence is unreliable. The executor is the single firing path for every concern shape (watch, transition, count, deadline/absence, verification). The entity graph carries memory: stable ids and attributes that know whether they were observed or asserted by the user, and whether they're still fresh.

The edge detector registry is the edge-proposes · brain-counts · VLM-confirms split: cheap, opt-in detectors emit timed boundaries that the brain accumulates — counting reps without spending a vision token per frame. Three reference skills ship as registered detectors:

motion-periodicity — frame-diff motion peaks (rep boundaries);
acoustic-onset — energy spikes on the existing mic stream (no new capture path);
pose-openness — BlazePose-based rep peaks (opt-in: percept-harness[pose], MediaPipe).

A full request trace — "tell me when the milk is about to boil over" — is in docs/e2e-flow-milk-boilover.md, including an honest map of where the perception ceiling bites.

The three packages

package	what	install
percept-vision	the cognition core — gate, entity graph, executor, events, scheduler, consent, fakes. Pure stdlib.	`pip install percept-vision` (`packages/percept-vision/`)
percept-harness	server-side transport shell + tier-0 salience gate (WatchSpec down / Tier0Signal up); the reference home of the edge detector skills.	`packages/percept-harness/`
@percept/edge	on-device reactive edge in JS/WASM: VAD + motion gate over the same WatchSpec/Tier0Signal wire-contract.	`packages/percept-edge/` (npm `@percept/edge`)

Benchmark — Percept Benchmark v1 (v0.5 DRAFT)

The benchmark holds the backbone fixed and measures the orchestration delta across a raw → core → e2e config ladder (no composite score — a vector of headlines). The current card is a v0.5 DRAFT (sampled, N≈12/track, gemini-2.5-flash) on the DeepMind Perception Test (CC-BY-4.0), the private golden ambiguity corpus, and RepCount-A; the e2e-relational accuracy cell is modeled (flagged ~), so the card is stamped DRAFT.

track	measured (DRAFT, N≈12)	reading
counting (RepCount-A)	pose OBO 0.33 vs 0.29 (TransRAC, CVPR'22); nMAE 0.80 vs 0.44	training-free pose ties a trained baseline on OBO, but its failures on low-amplitude actions cost it on MAE
acoustic (PT Sound Loc., unseen source)	vision-only recall 0.00 → 0.25–0.33 fused; fusion-FP 0–0.17	the recall-flip: fusion rescues ~a third of sound events vision-only never hears (honest at scale, not the single-clip 1.0)
relational (golden)	confidence AUROC ≈ 0.51 (≈ chance); ASK-rate ≈ 0.4	the VLM's confidence does not separate right from wrong → justifies the ASK discipline over threshold-tuning
timeliness (PT action onsets)	P-PAUC ≈ 60; reaction λ p50 ≈ 1s	first real proactive-timeliness numbers (adapted PAUC)
edge event-recall (PT onsets)	0.00 @ thr 0.10	⚠️ the motion-gate escalates on none of the subtle-action onsets (motion ~0.03–0.06 < 0.10) → e2e ≠ core there; a real calibration finding

A v0.5 DRAFT, not a RELEASE claim. Two findings the benchmark surfaced that a self-congratulatory eval would hide: the flat confidence AUROC (why the gate refuses to guess) and the edge event-recall of 0 on subtle actions (the motion-gate is mis-calibrated for them). Full plan, data, and references: packages/percept-vision/eval/BENCHMARK_PLAN.md.

Layout

packages/percept-vision/    the SDK (pip install percept-vision; import percept)
packages/percept-harness/   server-side tier-0 edge reference + detector skills
packages/percept-edge/      @percept/edge — on-device JS/WASM edge
docs/                       architecture & flow traces
eval/                       golden corpus (benchmark plan: packages/percept-vision/eval/)
Spec/                       the spec + implementation plan
Makefile                    test · eval-live · e2e · bench · check · reproduce

Docs

Quickstart — the 60-second offline snippet above; make test runs the fake-only unit lane.
Architecture / concepts — docs/e2e-flow-milk-boilover.md (the two layers, the firing path, the perception ceiling) and Spec/ (the full spec + phase plan).
Backends — select by name (fake · gemini · claude · deepgram) or env (PERCEPT_*_BACKEND); add via the percept.backends entry-point group. Extras: [gemini] · [claude] · [deepgram] · [audio].
Benchmark — eval/BENCHMARK_PLAN.md.

Status

v0.1.0 — early / first public release. The cognition core runs and is tested offline with fakes (the L1 lane, no keys); the frontier backends (Gemini, Claude, Deepgram) and the edge packages are wired behind their seams. The benchmark is a v0.5 DRAFT card. We are now in real-life testing — APIs and numbers may change. Issues and contributions welcome.

License

Apache-2.0.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jun 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

percept_vision-0.1.0.tar.gz (132.0 kB view details)

Uploaded Jun 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

percept_vision-0.1.0-py3-none-any.whl (93.5 kB view details)

Uploaded Jun 23, 2026 Python 3

File details

Details for the file percept_vision-0.1.0.tar.gz.

File metadata

Download URL: percept_vision-0.1.0.tar.gz
Upload date: Jun 23, 2026
Size: 132.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for percept_vision-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`660e06d8705e3e71e5081f62f8d84517e2702163de86c922b9dcc76e5d59fa9c`
MD5	`efa65804b6e51978639ca726df421287`
BLAKE2b-256	`4cce2e10502161a81a3b0fbc79ad17c2739dac2b274d8230f632b61ee0d3de14`

See more details on using hashes here.

File details

Details for the file percept_vision-0.1.0-py3-none-any.whl.

File metadata

Download URL: percept_vision-0.1.0-py3-none-any.whl
Upload date: Jun 23, 2026
Size: 93.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for percept_vision-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c76a3fc1c53516301b1cff1cff39e884e894c527ec85f1cef763adf1fc494b2c`
MD5	`ccbce05596a900133a194d79dd863e68`
BLAKE2b-256	`7c0efcc08fbdac7b4bec75ffd29e37df987a09dc4dd83b129132ebead284d219`

See more details on using hashes here.

percept-vision 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

percept

The envelope (refusals, stated proudly)

Install

60-second quickstart (no keys)

Architecture — two layers

The three packages

Benchmark — Percept Benchmark v1 (v0.5 DRAFT)

Layout

Docs

Status

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes