Perceptron multimodal SDK
Project description
The platform for physical AI
Perceptron is the Python SDK for building with perceptive-language models like Isaac 0.1. Designed for physical AI applications—robotics, manufacturing, logistics, and security—it provides a unified interface for grounded perception: detection, localization, OCR, and visual Q&A with structured outputs ready for robotics, analytics, and edge deployment. Route tasks to specialized models, swap providers per call, and compose complex multimodal flows with a typed DSL. Efficient enough for edge deployment, flexible enough for any real-world task.
Why Perceptron?
Grounded, spatial intelligence Get precise localization and grounded answers with conversational pointing—every claim is visually cited. Ask "what's broken in this machine?" and get highlighted regions with robust spatial reasoning that handles occlusions, relationships, and object interactions.
In-context learning for perception Show a few annotated examples (defects, safety conditions, custom categories) in your prompt and the model adapts—no YOLO-style fine-tuning or custom detector stacks required. Learn novel tasks from a handful of examples.
Efficient frontier for real-world deployment Isaac 0.1 matches models 50x its size while delivering edge-ready latencies and drastically lower serving costs. Perception workloads are continuous and latency-sensitive—Perceptron is built for the efficient frontier where capability meets real-world constraints.
Prompt for anything, control the output type Ask for whatever you need in natural language—"find safety violations", "locate damaged components", "identify obstacles"—and specify the output format: bounding boxes, points, polygons, or text. The flexibility of language models with the structure your application needs.
Installation
- Prerequisites: Python 3.10+ and
pip23+ (oruv) - Works with standard
pipif you don't useuv.
pip install perceptron
# Optional extras
pip install "perceptron[torch]" # Tensor utilities (requires PyTorch)
pip install "perceptron[dev]" # Ruff, pytest, pre-commit
Using uv:
uv pip install perceptron
# Optional: PyTorch helpers for tensor utilities
uv pip install "perceptron[torch]"
# Optional: Dev tools (ruff, pytest, pre-commit)
uv pip install "perceptron[dev]"
The CLI entry point perceptron is available after install.
Quick Start
from perceptron import detect, caption
# Detect objects with structured bounding boxes
result = detect(
"warehouse.jpg",
classes=["forklift", "person", "pallet"]
)
for box in result.points or []:
print(f"{box.mention}: ({box.top_left.x}, {box.top_left.y})")
# Generate image captions
desc = caption("scene.png", style="detailed")
print(desc.text)
No credentials? The SDK returns compile-only payloads when API keys are missing, letting you inspect requests before sending them.
Configuration
Set credentials once via environment, code, or the CLI. The SDK ships with the Perceptron backend enabled by default, and you can add or swap providers (e.g., fal) by extending perceptron.client._PROVIDER_CONFIG.
Environment variables (pick what you need):
PERCEPTRON_PROVIDER– provider identifier (perceptronby default)PERCEPTRON_API_KEY– API key for the selected provider- Provider-specific keys (e.g.,
FAL_KEY) when targeting alternates
export PERCEPTRON_PROVIDER=perceptron
export PERCEPTRON_API_KEY=sk_live_...
Programmatic:
from perceptron import configure, config
configure(provider="perceptron", api_key="sk_live_...")
with config(max_tokens=512):
... # temporary overrides while inside the context
CLI:
perceptron config --provider perceptron --api-key sk_live_...
No credentials? Helpers return compile-only payloads so you can inspect tasks before sending requests.
Core Features
Detection with structured outputs
Get normalized bounding boxes (0-1000 coordinate space) ready for downstream tasks:
from perceptron import detect
result = detect("factory_floor.jpg", classes=["defect", "warning"])
for box in result.points or []:
print(f"{box.mention}: {box.top_left} → {box.bottom_right}")
Image captioning
from perceptron import caption
result = caption("product.png", style="concise")
print(result.text) # "A blue widget on a white background"
OCR with custom prompts
from perceptron import ocr
result = ocr("schematic.png", prompt="Extract all component labels and their values")
print(result.text)
Streaming responses
Stream incremental text and coordinate deltas for real-time applications:
from perceptron import detect
for event in detect("frame.png", classes=["person"], stream=True):
if event["type"] == "text.delta":
print(event["chunk"], end="", flush=True)
elif event["type"] == "points.delta":
print(f"Detection: {event['points']}")
elif event["type"] == "final":
result = event["result"]
High-level helper surface
caption(image, *, style="concise", stream=False, **kwargs)– describe or summarize images.detect(image, *, classes=None, examples=None, stream=False, **kwargs)– grounded detection with points/boxes/polygons.ocr(image, *, prompt=None, stream=False, **kwargs)– text extraction with optional instructions.detect_from_coco(dataset_dir, *, split=None, classes=None, shots=0, limit=None, **kwargs)– auto-build few-shot prompts from datasets.perceive(nodes, *, expects="text", stream=False, **kwargs)/@perceive– compose arbitrary multimodal workflows with the DSL.
CLI Usage
The CLI provides quick access to core features for batch processing and scripting:
# Caption single image or directory
perceptron caption image.jpg
perceptron caption ./images --style detailed
# OCR with custom prompt
perceptron ocr document.png --prompt "Extract table data"
# Detect objects (writes detections.json)
perceptron detect ./frames --classes forklift,person,pallet
# Visual Q&A with grounding
perceptron question scene.jpg "Where is the safety equipment?" --expects box
Directory mode disables streaming, writes JSON summaries (detections.json) alongside the input folder, and logs per-file validation issues for easier auditing.
Advanced Usage
Few-shot detection with COCO datasets
Automatically build balanced in-context examples from annotated datasets:
from perceptron import detect_from_coco
results = detect_from_coco(
"/datasets/custom",
split="train",
shots=4, # balanced examples per class
classes=["defect", "ok"]
)
for sample in results:
print(f"{sample.image_path.name}: {len(sample.result.points or [])} detections")
Coordinate scaling
Outputs use normalized 0-1000 coordinates. Convert to pixels for rendering or metrics:
from PIL import Image
from perceptron import detect, scale_points_to_pixels
result = detect("frame.png", classes=["forklift"])
width, height = Image.open("frame.png").size
# Option 1: helper function
pixel_boxes = scale_points_to_pixels(result.points, width=width, height=height)
# Option 2: convenience method on PerceiveResult
pixel_boxes = result.points_to_pixels(width, height)
for box in pixel_boxes or []:
x1, y1 = box.top_left.x, box.top_left.y
x2, y2 = box.bottom_right.x, box.bottom_right.y
print(f"{box.mention}: [{x1}, {y1}, {x2}, {y2}]")
Composing tasks with the DSL
For complex workflows, compose multimodal prompts with typed nodes and the @perceive decorator:
from perceptron import perceive, image, text
@perceive(expects="box", stream=True)
def find_safety_equipment(image_path):
return [
image(image_path),
text("Locate all safety equipment including helmets, vests, and signs")
]
# Use the decorated function
for event in find_safety_equipment("warehouse.jpg"):
if event["type"] == "final":
for box in event["result"]["points"]:
print(f"{box['mention']}: {box['top_left']}")
# Inspect compiled payload without executing
payload = find_safety_equipment.inspect("warehouse.jpg")
print(payload)
Available DSL nodes: image, text, system, point, box, polygon, collection
Troubleshooting
| Symptom | Likely cause | Resolution |
|---|---|---|
| Compile-only result (no text) | Missing provider credentials | Export PERCEPTRON_API_KEY / FAL_KEY or call configure(api_key=...). |
stream_buffer_overflow warning |
Streaming responses exceeded buffer | Raise max_buffer_bytes via configure(...) or disable streaming. |
| Empty detections in directory mode | No supported image extensions discovered | Limit inputs to .jpg, .png, .webp, .gif, .bmp, .tif, .tiff, .heic, .heif. |
| Bounding-box coordinate errors | Inconsistent annotations or detached image payload | Validate annotation bounds and ensure each request attaches the relevant image. |
Development
Clone the repo and install in editable mode with dev dependencies:
git clone https://github.com/perceptron-ai-inc/perceptron.git
cd perceptron
uv pip install -e ".[dev]"
pre-commit install
Run tests and checks:
pytest # Run tests with coverage
pre-commit run --all-files # Run linters and formatters
Repository structure:
src/perceptron/– SDK core, client, DSL, providerstests/– Test suite with coverage reportingcookbook/– Example notebooks and scriptspapers/– Research publicationstools/– Development utilities
Coverage reports are automatically published to Codecov via CI.
Documentation & Support
- Full Documentation: docs.perceptron.inc
- Research Paper: papers/isaac_01.pdf
- Technical Support: support@perceptron.inc
- Commercial Licensing: sales@perceptron.inc
- Careers: join-us@perceptron.inc
License
Model weights are released under the Creative Commons Attribution-NonCommercial 4.0 International License. For commercial licensing, contact sales@perceptron.inc.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file perceptron-0.2.0.tar.gz.
File metadata
- Download URL: perceptron-0.2.0.tar.gz
- Upload date:
- Size: 87.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
369ff3078ba7ac9e3b5f30d9f75ff44d72991b64c94f93c5267e751552cab3f6
|
|
| MD5 |
34e69865afdef26d75a7c91d51aabb7b
|
|
| BLAKE2b-256 |
c6ff87efbc3988094e09eb29261d545c84cd0a21376daa997435f5566281e2d2
|
File details
Details for the file perceptron-0.2.0-py3-none-any.whl.
File metadata
- Download URL: perceptron-0.2.0-py3-none-any.whl
- Upload date:
- Size: 61.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7dc7713778b797f3cb013406eb507ae729ca360347dba8196e82361134a436e8
|
|
| MD5 |
7dfabc624d5cbf3f0526ffe3cabffe82
|
|
| BLAKE2b-256 |
8b83983a6663a7814c0772eabdf3f2e616758abd50a244dfbd770785c9c2ab95
|