Skip to main content

Autonomous, self-evolving Photo Agents — a perceive / reason / act framework for photo-aware agents.

Project description

Photo Agents

Autonomous self-evolving Photo Agents. A perceive / reason / act framework for photo-aware agents that operate your computer the way you do.

"100% autonomous, self-evolving agents." photo-agents.com

Star History

Star History Chart

About

Photo Agents is building the next generation of LLM-driven agents that ground in what they actually see on screen. Instead of dumping longer chat transcripts into a model and hoping for the best we treat memory the way biology does. Vision in. Bound observations stored in layers. Skills written by the agent itself from real success.

The package in this repo is the runtime that ships that idea. It runs locally so you keep ownership of your screen your data and your keys.

Bio

A small team obsessed with photographic memory for machines. We come from computational neuroscience research and applied LLM systems work. The thesis is simple. Bigger context windows are not memory. Real agents need perception that binds layered memory that retrieves and skills that compound across sessions.

If that resonates follow @photoagents on X for build notes demos and the occasional rant about why text-only agents will never see your UI.

Hashtags

#PhotoAgents #AIagents #LLM #AgentMemory #PhotographicMemory #VisionAgents #ComputerUse #SelfEvolvingAgents #OpenSource #Python

What it is

Photo Agents is a single Python package that bundles:

  • A streaming agent loop (photoagents.core.loop.run_agent_session) that drives any tool-calling LLM through a perceive → reason → act cycle.
  • A multi-provider LLM router (photoagents.llm.router) with first-class support for Anthropic Claude (native) OpenAI GPT (native) and a mixin failover session.
  • A physical-execution toolset: file I/O, sandboxed code execution (Python / PowerShell / bash), browser automation via a Chrome DevTools Protocol bridge and a layered memory system (working / global / SOP / session archive).
  • Pluggable clients: a polished Streamlit web app, a PyQt desktop app, a desktop companion and ready-to-run bots for Telegram, QQ, Feishu, WeCom and DingTalk.
  • Optional observability via Langfuse and a cron-style scheduler.

The whole thing is gated by a remote-validated Photo Agents API key so usage stays accountable.

Install

pip install photoagents
# or, with every optional client and integration
pip install "photoagents[all]"

Photo Agents needs Python 3.10+.

Get an API key

Photo Agents requires a license key, validated against https://photo-agents.com/v1/keys/validate. Sign in and create one at:

https://photo-agents.com/account/keys

Then make it available to the runtime in any of these ways (checked in order):

  1. Environment variable: PHOTOAGENTS_API_KEY=pk_live_...
  2. Saved config: ~/.photoagents/config.json field api_key
  3. Interactive prompt on first run (offered to be saved automatically)

A successful validation is cached for 24 hours so the gate stays fast.

LLM credentials

Copy the credentials template and fill in your provider key:

# from the repo root
cp photoagents/config/keys_template.py credentials.py
# then edit credentials.py and uncomment one of the provider configs

The runtime also accepts a JSON form (credentials.json) with the same shape.

Run

# Interactive REPL on your terminal
python -m photoagents

# One-shot file-IO mode
python -m photoagents --task my_task --input "List the largest files in this directory."

# Reflect / watchdog mode (your check() function fires the next task)
python -m photoagents --reflect photoagents/evolution/scheduler.py

GUI clients

Photo Agents ships several optional frontends. Pick whichever fits your workflow:

Client Launch command
Streamlit web app + webview pythonw -m photoagents.cli.launcher
Service hub (start/stop) pythonw -m photoagents.cli.hub
Desktop app (PyQt) python -m photoagents.clients.desktop_app
Desktop companion pythonw -m photoagents.clients.companion_v2
Telegram bot python -m photoagents.clients.telegram_client
Feishu / WeCom / DingTalk / QQ `python -m photoagents.clients.<feishu

The launcher and hub both call the same API key gate before starting any service, so they will refuse to launch anything if your key is missing or revoked.

On-disk state

Path What lives there
~/.photoagents/config.json API key + license validation cache
~/.photoagents/global_mem.txt Long-term L2 facts
~/.photoagents/sessions/ L4 raw session archives
~/.photoagents/skill_index/ Vector index for skill / SOP search
~/.photoagents/temp/ Per-task scratch (logs, intermediate output)

Project layout

photoagents/
├── auth/        License gate (remote-validated API key)
├── cli/         python -m photoagents, GUI launcher, service hub
├── clients/     Web / desktop / chat-platform frontends
├── config/      credentials.py template
├── core/        Agent loop and tool dispatcher
├── evolution/   Reflection / scheduler scripts (the "self-evolving" loop)
├── integrations/Optional third-party hooks (Langfuse, etc.)
├── llm/         Multi-provider session router
├── resources/   System prompt, tool schema, CDP bridge, demo media
├── skills/      L3 SOPs and helper modules (browser, vision, OCR, ...)
└── web/         DOM simplifier and Chrome DevTools Protocol driver

License

MIT. See LICENSE.

Status

Status: beta. APIs may change before 1.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

photoagents-0.1.0.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

photoagents-0.1.0-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file photoagents-0.1.0.tar.gz.

File metadata

  • Download URL: photoagents-0.1.0.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for photoagents-0.1.0.tar.gz
Algorithm Hash digest
SHA256 283b532ed5f06816f7cd8f6894aec5cfcdf6f675a4e4fe0831c5515143771ca8
MD5 6674315a8ab7adebc68f6061f6b9ca0d
BLAKE2b-256 b8d02daa7b897dbccbd63f826b160f76d67cab207708248027e005615c128837

See more details on using hashes here.

File details

Details for the file photoagents-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: photoagents-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for photoagents-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 084841dd101027c6568e783d651d2bd620c6c49509640b6f68efe95d1c314fb1
MD5 2360243dc465ee7a222b056213e3d113
BLAKE2b-256 2237f1b372683995300dd36dfa1a3b1435300b0e07b79b7249cb95abbbceb919

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page