Research preview — the open-source cognition layer for goal-driven, proactive vision agents.
Project description
percept-vision
The open-source cognition layer for goal-driven, proactive vision agents.
Package: percept-vision · import: import percept (name ≠ import).
⚠️ Research preview — v0.6.0. Published for real-life testing and feedback, not for production. The cognition core (gate, entity graph, executor, events, scheduler, consent) runs and is tested offline with fakes. APIs may change between
0.6.xreleases — pin a version. No benchmark numbers are published yet (the model-sweep phase comes later). Issues and feedback welcome.
You state a goal in plain language — "nudge me when I drink a coffee", "tell me when the kettle boils" — and percept turns a live audio-visual stream into an agent that reasons over entities across time, fires only on the rising edge of a condition becoming true, and refuses to guess — a three-state gate maps known → act · not → silent · unknown → ask.
The wedge is temporal cognition — entity memory + the three-state gate + reasoning over events — which a raw VLM-in-a-loop and a per-frame pipeline both lack. percept builds on frontier models for perception behind vendor-neutral seams.
Install
pip install percept-vision # core: PURE STDLIB — runs offline with fake backends, no keys
The core has zero dependencies and runs with deterministic fake backends, so pip install then run
works with no API keys. Python ≥ 3.10. Frontier backends are opt-in extras:
pip install "percept-vision[gemini]" # GeminiVision
pip install "percept-vision[claude]" # AnthropicVision (Claude)
pip install "percept-vision[deepgram]" # Deepgram STT + TTS (voice)
pip install "percept-vision[runtime]" # camera/video/RTSP runtime worker
pip install "percept-vision[bench]" # benchmark + scorecard tooling
pip install "percept-vision[edge]" # Python tier-0 gate + detector registry
pip install "percept-vision[all]" # everything
60-second quickstart (no keys)
Fully offline — fake backends, deterministic, nothing to configure. This script runs as-is:
import asyncio
from percept import Percept, Goal
async def main():
# Fake backends by default — offline, no keys. discover_plugins=False skips plugin lookup.
agent = Percept.create(discover_plugins=False)
agent.add_goal(Goal(
id="caffeine",
condition="the user is drinking coffee",
say="Heads up — stepping back from caffeine?",
))
# Each frame is judged; the gate fires ONCE on the rising edge.
# ("sip-coffee" is a token the fake vision backend scripts as a confident YES.)
fires = await agent.perceive_judged("sip-coffee")
for ev in fires: # ev is a FireEvent
print(ev.action, ev.goal_id, ev.text) # -> fire caffeine Heads up — stepping back from caffeine?
# The same frame again does NOT re-fire — rising-edge, not level-triggered.
print(await agent.perceive_judged("sip-coffee")) # -> []
asyncio.run(main())
perceive_judged(frame) returns a list of FireEvent(goal_id, action, text, entity_id, verdict),
where action is "fire" or "ask". Swap the fake for real eyes — the cognition layer is unchanged:
agent = Percept.create(vision="gemini") # needs percept-vision[gemini] + GEMINI_API_KEY
Backends are selected by name ("fake" · "gemini" · "claude"), by adapter instance, or by env
(PERCEPT_VISION_BACKEND).
Architecture — two layers, four tiers
percept cleanly splits the eyes from the brain: a cheap, stateful System-1 cognition that
reasons over a stream of small structured facts (the three-state gate, the entity graph, the concern
primitive — always in the loop), and an expensive, stateless System-2 perception it summons only
when it must (vision.judge(condition, frame) → Verdict, behind vendor-neutral registry seams). A raw
VLM-in-a-loop collapses the two; percept keeps them apart so the common case costs nothing.
Full architecture — the four tiers (Feeds → Measures → Cognition → VLM), every claim anchored to code, plus the generated architecture diagram — is in the repo: Architecture · Concepts · docs index.
The envelope (refusals, stated proudly)
- Assistant-class, never the safety mechanism. percept informs a human who stays responsible. It is never the thing standing between a person and harm on a clock it can't guarantee.
- Watches the user's own world, with the user as beneficiary — never a non-consenting third party.
These refusals are a feature. Do not lead with surveillance demos.
Packages & versioning
- percept-vision (this package) — the Python SDK: cognition core, fakes, runtime worker, the
Python tier-0 edge/harness, and benchmark tooling (extras). The tier-0 harness and the frozen
wire-contract are bundled in as the
percept.harnessandpercept.contractssubpackages. - percept-edge (npm) — the on-device reactive edge in JS/WASM: VAD + motion gate that share the
Tier0Signal/detector wire-contract. The edge's detector + gate outputs are byte-for-byte
identical to the Python harness (verified, 12/12); the WatchSpec it consumes is a documented subset
(server-only fields like
schedule/cadenceare enforced in the cognition layer, not on-device).
Versioning. Package version
percept-vision 0.6.0(research preview). The wire-contract version is separate —CONTRACT_VERSION = "1.2.0"(additive-only). A0.6.0install exporting contract1.2.0is expected, not a bug.
Benchmark status
No Percept accuracy numbers are published yet. The benchmark is designed but the reproducible model-sweep run is a separate, later phase — see the methodology. The scorecard tooling operates on a manifest you provide; the bundled golden set ships with the source repo, not the wheel, so the default-manifest commands run from a repo checkout:
# from a repo checkout (the golden set is repo data, not packaged):
percept-bench scorecard --manifest eval/golden-v1/MANIFEST.json --out out/scorecard.txt
# or point --manifest at your own labelled-clip manifest, anywhere.
Status
v0.6.0 — research preview. The cognition core runs and is tested offline with fakes (the L1 lane, no keys); the frontier backends and the edge packages are wired behind their seams. No benchmark numbers yet. We are in real-life testing — APIs may change. Issues and contributions welcome: github.com/divi-vijayakumar/Percept.
License
Apache-2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file percept_vision-0.6.0.tar.gz.
File metadata
- Download URL: percept_vision-0.6.0.tar.gz
- Upload date:
- Size: 341.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
969b6157d1e79d29cdba6bde804ba16a8ffb7d5253af84ebab16d88764f8e2a9
|
|
| MD5 |
47ed87c77a7ee04f85d28adfb743f433
|
|
| BLAKE2b-256 |
a9c1c2a3aaac6e1bd52953ee1ec20d394c204cf8c6045c04ea1fef7850af3464
|
File details
Details for the file percept_vision-0.6.0-py3-none-any.whl.
File metadata
- Download URL: percept_vision-0.6.0-py3-none-any.whl
- Upload date:
- Size: 291.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
847bd2cde89ca227f58d31a561c4d48b305c8d57a29a4c5a3fd299cae446f773
|
|
| MD5 |
73727dc6655c5e6819f84f34b5eb8c36
|
|
| BLAKE2b-256 |
652f1be0f9367621d1769bf0d030af91201f9d38fea30f7f63e0d3aa2d8606f5
|