Autonomous, self-evolving Photo Agents — a perceive / reason / act framework for photo-aware agents.
Project description
Photo Agents
Autonomous self-evolving Photo Agents. A perceive / reason / act framework for photo-aware agents that operate your computer the way you do.
"100% autonomous, self-evolving agents." photo-agents.com
Star History
About
Photo Agents is building the next generation of LLM-driven agents that ground in what they actually see on screen. Instead of dumping longer chat transcripts into a model and hoping for the best we treat memory the way biology does. Vision in. Bound observations stored in layers. Skills written by the agent itself from real success.
The package in this repo is the runtime that ships that idea. It runs locally so you keep ownership of your screen your data and your keys.
- Website: https://photo-agents.com
- X / Twitter: https://x.com/photoagents
- Docs: https://photo-agents.com/docs
- Account and API keys: https://photo-agents.com/account/keys
Bio
A small team obsessed with photographic memory for machines. We come from computational neuroscience research and applied LLM systems work. The thesis is simple. Bigger context windows are not memory. Real agents need perception that binds layered memory that retrieves and skills that compound across sessions.
If that resonates follow @photoagents on X for build notes demos and the occasional rant about why text-only agents will never see your UI.
Hashtags
#PhotoAgents #AIagents #LLM #AgentMemory #PhotographicMemory #VisionAgents #ComputerUse #SelfEvolvingAgents #OpenSource #Python
What it is
Photo Agents is a single Python package that bundles:
- A streaming agent loop (
photoagents.core.loop.run_agent_session) that drives any tool-calling LLM through a perceive → reason → act cycle. - A multi-provider LLM router (
photoagents.llm.router) with first-class support for Anthropic Claude (native) OpenAI GPT (native) and a mixin failover session. - A physical-execution toolset: file I/O, sandboxed code execution (Python / PowerShell / bash), browser automation via a Chrome DevTools Protocol bridge and a layered memory system (working / global / SOP / session archive).
- Pluggable clients: a polished Streamlit web app, a PyQt desktop app, a desktop companion and ready-to-run bots for Telegram, QQ, Feishu, WeCom and DingTalk.
- Optional observability via Langfuse and a cron-style scheduler.
The whole thing is gated by a remote-validated Photo Agents API key so usage stays accountable.
Install
pip install photoagents
# or, with every optional client and integration
pip install "photoagents[all]"
Photo Agents needs Python 3.10+.
Get an API key
Photo Agents requires a license key, validated against https://photo-agents.com/v1/keys/validate. Sign in and create one at:
Then make it available to the runtime in any of these ways (checked in order):
- Environment variable:
PHOTOAGENTS_API_KEY=pk_live_... - Saved config:
~/.photoagents/config.jsonfieldapi_key - Interactive prompt on first run (offered to be saved automatically)
A successful validation is cached for 24 hours so the gate stays fast.
LLM credentials
Copy the credentials template and fill in your provider key:
# from the repo root
cp photoagents/config/keys_template.py credentials.py
# then edit credentials.py and uncomment one of the provider configs
The runtime also accepts a JSON form (credentials.json) with the same shape.
Run
# Interactive REPL on your terminal
python -m photoagents
# One-shot file-IO mode
python -m photoagents --task my_task --input "List the largest files in this directory."
# Reflect / watchdog mode (your check() function fires the next task)
python -m photoagents --reflect photoagents/evolution/scheduler.py
GUI clients
Photo Agents ships several optional frontends. Pick whichever fits your workflow:
| Client | Launch command |
|---|---|
| Streamlit web app + webview | pythonw -m photoagents.cli.launcher |
| Service hub (start/stop) | pythonw -m photoagents.cli.hub |
| Desktop app (PyQt) | python -m photoagents.clients.desktop_app |
| Desktop companion | pythonw -m photoagents.clients.companion_v2 |
| Telegram bot | python -m photoagents.clients.telegram_client |
| Feishu / WeCom / DingTalk / QQ | `python -m photoagents.clients.<feishu |
The launcher and hub both call the same API key gate before starting any service, so they will refuse to launch anything if your key is missing or revoked.
On-disk state
| Path | What lives there |
|---|---|
~/.photoagents/config.json |
API key + license validation cache |
~/.photoagents/global_mem.txt |
Long-term L2 facts |
~/.photoagents/sessions/ |
L4 raw session archives |
~/.photoagents/skill_index/ |
Vector index for skill / SOP search |
~/.photoagents/temp/ |
Per-task scratch (logs, intermediate output) |
Project layout
photoagents/
├── auth/ License gate (remote-validated API key)
├── cli/ python -m photoagents, GUI launcher, service hub
├── clients/ Web / desktop / chat-platform frontends
├── config/ credentials.py template
├── core/ Agent loop and tool dispatcher
├── evolution/ Reflection / scheduler scripts (the "self-evolving" loop)
├── integrations/Optional third-party hooks (Langfuse, etc.)
├── llm/ Multi-provider session router
├── resources/ System prompt, tool schema, CDP bridge, demo media
├── skills/ L3 SOPs and helper modules (browser, vision, OCR, ...)
└── web/ DOM simplifier and Chrome DevTools Protocol driver
License
MIT. See LICENSE.
Status
Status: beta. APIs may change before 1.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file photoagents-0.1.0.tar.gz.
File metadata
- Download URL: photoagents-0.1.0.tar.gz
- Upload date:
- Size: 6.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
283b532ed5f06816f7cd8f6894aec5cfcdf6f675a4e4fe0831c5515143771ca8
|
|
| MD5 |
6674315a8ab7adebc68f6061f6b9ca0d
|
|
| BLAKE2b-256 |
b8d02daa7b897dbccbd63f826b160f76d67cab207708248027e005615c128837
|
File details
Details for the file photoagents-0.1.0-py3-none-any.whl.
File metadata
- Download URL: photoagents-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
084841dd101027c6568e783d651d2bd620c6c49509640b6f68efe95d1c314fb1
|
|
| MD5 |
2360243dc465ee7a222b056213e3d113
|
|
| BLAKE2b-256 |
2237f1b372683995300dd36dfa1a3b1435300b0e07b79b7249cb95abbbceb919
|