perceptron

Perceptron multimodal SDK

Project description

The platform for physical AI

Perceptron is the Python SDK for building with perceptive-language models like Isaac 0.1. Designed for physical AI applications—robotics, manufacturing, logistics, and security—it provides a unified interface for grounded perception: detection, localization, OCR, and visual Q&A with structured outputs ready for robotics, analytics, and edge deployment. Route tasks to specialized models, swap providers per call, and compose complex multimodal flows with a typed DSL. Efficient enough for edge deployment, flexible enough for any real-world task.

Website · Docs · Community

Why Perceptron?

Grounded, spatial intelligence Get precise localization and grounded answers with conversational pointing—every claim is visually cited. Ask "what's broken in this machine?" and get highlighted regions with robust spatial reasoning that handles occlusions, relationships, and object interactions.

In-context learning for perception Show a few annotated examples (defects, safety conditions, custom categories) in your prompt and the model adapts—no YOLO-style fine-tuning or custom detector stacks required. Learn novel tasks from a handful of examples.

Efficient frontier for real-world deployment Isaac 0.1 matches models 50x its size while delivering edge-ready latencies and drastically lower serving costs. Perception workloads are continuous and latency-sensitive—Perceptron is built for the efficient frontier where capability meets real-world constraints.

Prompt for anything, control the output type Ask for whatever you need in natural language—"find safety violations", "locate damaged components", "identify obstacles"—and specify the output format: bounding boxes, points, polygons, or text. The flexibility of language models with the structure your application needs.

Installation

Prerequisites: Python 3.10+ and pip 23+ (or uv)
Works with standard pip if you don't use uv.

pip install perceptron

# Optional extras
pip install "perceptron[torch]"   # Tensor utilities (requires PyTorch)
pip install "perceptron[dev]"     # Ruff, pytest, pre-commit

Using uv:

uv pip install perceptron

# Optional: PyTorch helpers for tensor utilities
uv pip install "perceptron[torch]"

# Optional: Dev tools (ruff, pytest, pre-commit)
uv pip install "perceptron[dev]"

The CLI entry point perceptron is available after install.

Quick Start

from perceptron import detect, caption

# Detect objects with structured bounding boxes
result = detect(
    "warehouse.jpg",
    classes=["forklift", "person", "pallet"]
)

for box in result.points or []:
    print(f"{box.mention}: ({box.top_left.x}, {box.top_left.y})")

# Generate image captions
desc = caption("scene.png", style="detailed")
print(desc.text)

No credentials? The SDK returns compile-only payloads when API keys are missing, letting you inspect requests before sending them.

Configuration

Set credentials once via environment, code, or the CLI. The SDK ships with the Perceptron backend enabled by default, and you can add or swap providers (e.g., fal) by extending perceptron.client._PROVIDER_CONFIG.

Environment variables (pick what you need):

PERCEPTRON_PROVIDER – provider identifier (perceptron by default)
PERCEPTRON_API_KEY – API key for the selected provider
Provider-specific keys (e.g., FAL_KEY) when targeting alternates

export PERCEPTRON_PROVIDER=perceptron
export PERCEPTRON_API_KEY=sk_live_...

Programmatic:

from perceptron import configure, config

configure(provider="perceptron", api_key="sk_live_...")

with config(max_tokens=512):
    ...  # temporary overrides while inside the context

CLI:

perceptron config --provider perceptron --api-key sk_live_...

No credentials? Helpers return compile-only payloads so you can inspect tasks before sending requests.

Core Features

Detection with structured outputs

Get normalized bounding boxes (0-1000 coordinate space) ready for downstream tasks:

from perceptron import detect

result = detect("factory_floor.jpg", classes=["defect", "warning"])

for box in result.points or []:
    print(f"{box.mention}: {box.top_left} → {box.bottom_right}")

Image captioning

from perceptron import caption

result = caption("product.png", style="concise")
print(result.text)  # "A blue widget on a white background"

OCR with custom prompts

from perceptron import ocr

result = ocr("schematic.png", prompt="Extract all component labels and their values")
print(result.text)

Streaming responses

Stream incremental text and coordinate deltas for real-time applications:

from perceptron import detect

for event in detect("frame.png", classes=["person"], stream=True):
    if event["type"] == "text.delta":
        print(event["chunk"], end="", flush=True)
    elif event["type"] == "points.delta":
        print(f"Detection: {event['points']}")
    elif event["type"] == "final":
        result = event["result"]

High-level helper surface

caption(image, *, style="concise", stream=False, **kwargs) – describe or summarize images.
detect(image, *, classes=None, examples=None, stream=False, **kwargs) – grounded detection with points/boxes/polygons.
ocr(image, *, prompt=None, stream=False, **kwargs) – text extraction with optional instructions.
detect_from_coco(dataset_dir, *, split=None, classes=None, shots=0, limit=None, **kwargs) – auto-build few-shot prompts from datasets.
perceive(nodes, *, expects="text", stream=False, **kwargs) / @perceive – compose arbitrary multimodal workflows with the DSL.

CLI Usage

The CLI provides quick access to core features for batch processing and scripting:

# Caption single image or directory
perceptron caption image.jpg
perceptron caption ./images --style detailed

# OCR with custom prompt
perceptron ocr document.png --prompt "Extract table data"

# Detect objects (writes detections.json)
perceptron detect ./frames --classes forklift,person,pallet

# Visual Q&A with grounding
perceptron question scene.jpg "Where is the safety equipment?" --expects box

Directory mode disables streaming, writes JSON summaries (detections.json) alongside the input folder, and logs per-file validation issues for easier auditing.

Advanced Usage

Few-shot detection with COCO datasets

Automatically build balanced in-context examples from annotated datasets:

from perceptron import detect_from_coco

results = detect_from_coco(
    "/datasets/custom",
    split="train",
    shots=4,  # balanced examples per class
    classes=["defect", "ok"]
)

for sample in results:
    print(f"{sample.image_path.name}: {len(sample.result.points or [])} detections")

Coordinate scaling

Outputs use normalized 0-1000 coordinates. Convert to pixels for rendering or metrics:

from PIL import Image
from perceptron import detect, scale_points_to_pixels

result = detect("frame.png", classes=["forklift"])
width, height = Image.open("frame.png").size

# Option 1: helper function
pixel_boxes = scale_points_to_pixels(result.points, width=width, height=height)

# Option 2: convenience method on PerceiveResult
pixel_boxes = result.points_to_pixels(width, height)

for box in pixel_boxes or []:
    x1, y1 = box.top_left.x, box.top_left.y
    x2, y2 = box.bottom_right.x, box.bottom_right.y
    print(f"{box.mention}: [{x1}, {y1}, {x2}, {y2}]")

Composing tasks with the DSL

For complex workflows, compose multimodal prompts with typed nodes and the @perceive decorator:

from perceptron import perceive, image, text

@perceive(expects="box", stream=True)
def find_safety_equipment(image_path):
    return [
        image(image_path),
        text("Locate all safety equipment including helmets, vests, and signs")
    ]

# Use the decorated function
for event in find_safety_equipment("warehouse.jpg"):
    if event["type"] == "final":
        for box in event["result"]["points"]:
            print(f"{box['mention']}: {box['top_left']}")

# Inspect compiled payload without executing
payload = find_safety_equipment.inspect("warehouse.jpg")
print(payload)

Available DSL nodes: image, text, system, point, box, polygon, collection

Troubleshooting

Symptom	Likely cause	Resolution
Compile-only result (no text)	Missing provider credentials	Export `PERCEPTRON_API_KEY` / `FAL_KEY` or call `configure(api_key=...)`.
`stream_buffer_overflow` warning	Streaming responses exceeded buffer	Raise `max_buffer_bytes` via `configure(...)` or disable streaming.
Empty detections in directory mode	No supported image extensions discovered	Limit inputs to `.jpg`, `.png`, `.webp`, `.gif`, `.bmp`, `.tif`, `.tiff`, `.heic`, `.heif`.
Bounding-box coordinate errors	Inconsistent annotations or detached image payload	Validate annotation bounds and ensure each request attaches the relevant image.

Development

Clone the repo and install in editable mode with dev dependencies:

git clone https://github.com/perceptron-ai-inc/perceptron.git
cd perceptron
uv pip install -e ".[dev]"
pre-commit install

Run tests and checks:

pytest                          # Run tests with coverage
pre-commit run --all-files      # Run linters and formatters

Repository structure:

src/perceptron/ – SDK core, client, DSL, providers
tests/ – Test suite with coverage reporting
cookbook/ – Example notebooks and scripts
papers/ – Research publications
tools/ – Development utilities

Coverage reports are automatically published to Codecov via CI.

Documentation & Support

Full Documentation: docs.perceptron.inc
Research Paper: papers/isaac_01.pdf
Technical Support: support@perceptron.inc
Commercial Licensing: sales@perceptron.inc
Careers: join-us@perceptron.inc

License

Model weights are released under the Creative Commons Attribution-NonCommercial 4.0 International License. For commercial licensing, contact sales@perceptron.inc.

Project details

Release history Release notifications | RSS feed

This version

0.2.0

Jan 14, 2026

0.1.4

Nov 12, 2025

0.1.3

Nov 12, 2025

0.1.1

Sep 17, 2025

0.1.0

Sep 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

perceptron-0.2.0.tar.gz (87.4 kB view details)

Uploaded Jan 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

perceptron-0.2.0-py3-none-any.whl (61.1 kB view details)

Uploaded Jan 14, 2026 Python 3

File details

Details for the file perceptron-0.2.0.tar.gz.

File metadata

Download URL: perceptron-0.2.0.tar.gz
Upload date: Jan 14, 2026
Size: 87.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for perceptron-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`369ff3078ba7ac9e3b5f30d9f75ff44d72991b64c94f93c5267e751552cab3f6`
MD5	`34e69865afdef26d75a7c91d51aabb7b`
BLAKE2b-256	`c6ff87efbc3988094e09eb29261d545c84cd0a21376daa997435f5566281e2d2`

See more details on using hashes here.

File details

Details for the file perceptron-0.2.0-py3-none-any.whl.

File metadata

Download URL: perceptron-0.2.0-py3-none-any.whl
Upload date: Jan 14, 2026
Size: 61.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for perceptron-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7dc7713778b797f3cb013406eb507ae729ca360347dba8196e82361134a436e8`
MD5	`7dfabc624d5cbf3f0526ffe3cabffe82`
BLAKE2b-256	`8b83983a6663a7814c0772eabdf3f2e616758abd50a244dfbd770785c9c2ab95`

See more details on using hashes here.

perceptron 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

The platform for physical AI

Why Perceptron?

Installation

Quick Start

Configuration

Core Features

Detection with structured outputs

Image captioning

OCR with custom prompts

Streaming responses

High-level helper surface

CLI Usage

Advanced Usage

Few-shot detection with COCO datasets

Coordinate scaling

Composing tasks with the DSL

Troubleshooting

Development

Documentation & Support

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes