Autonomous AI agent framework — visual computer use, MCP tools, and multi-agent swarms

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Ale777123

These details have not been verified by PyPI

Project links

Project description

gantrygraph

Autonomous agent framework for Python. Screenshot → think → act. LangGraph inside. Zero boilerplate outside.

from gantrygraph import GantryEngine, gantry_tool
from gantrygraph.perception import DesktopScreen
from gantrygraph.actions import MouseKeyboardTools
from langchain_anthropic import ChatAnthropic

@gantry_tool
async def read_jira(ticket_id: str) -> str:
    """Fetch a Jira ticket and return its description."""
    return await jira_client.get(ticket_id)

agent = GantryEngine(
    llm=ChatAnthropic(model="claude-sonnet-4-6"),
    perception=DesktopScreen(),
    tools=[MouseKeyboardTools(), read_jira],
    max_steps=50,
)
agent.run("Open PROJ-123 in Jira and submit the fix.")

Full docs at gantrygraph.com.

Why gantrygraph?

	gantrygraph	Raw LangGraph	AutoGen
Visual computer use (screenshot + click)	✅ built-in	❌	❌
MCP tool servers	✅ built-in	❌	❌
`@gantry_tool` — any function in 1 line	✅	❌	partial
Human-in-the-loop (suspend / resume)	✅	manual	partial
Stealth browser (bot-detection bypass)	✅ built-in	❌	❌
Persistent browser sessions	✅ `profile_dir`	❌	❌
Set-of-Mark: click by element ID, not coords	✅ built-in	❌	❌
Screenshot pipeline (annotate / compress)	✅ built-in	❌	❌
`import gantrygraph` never fails	✅	—	—
Strict-typed (mypy strict)	✅	partial	❌

Install

# Core only (no GUI, no browser, no cloud)
pip install gantrygraph

# Desktop automation (screenshot + mouse/keyboard)
pip install 'gantrygraph[desktop]'

# Web scraping / form filling / browser automation
pip install 'gantrygraph[browser]'
playwright install chromium

# REST server (POST /run, SSE streaming)
pip install 'gantrygraph[cloud]'

# Persistent semantic memory (ChromaDB)
pip install 'gantrygraph[memory]'

# Ultra-light TTL memory — Rust HNSW, 48 bytes/vector, entries auto-expire
pip install 'gantrygraph[minivecdb]'

# Everything
pip install 'gantrygraph[all]'

Common patterns

Add a custom tool

from gantrygraph import GantryEngine, gantry_tool
from langchain_anthropic import ChatAnthropic

@gantry_tool
def search_orders(query: str, limit: int = 5) -> str:
    """Search the order management system by keyword."""
    rows = db.execute("SELECT * FROM orders WHERE ... LIMIT ?", query, limit)
    return "\n".join(str(r) for r in rows)

agent = GantryEngine(
    llm=ChatAnthropic(model="claude-sonnet-4-6"),
    tools=[search_orders],
)
agent.run("Find all overdue orders and summarise them.")

Use @gantry_tool(destructive=True) to tag a tool as destructive — the engine automatically gates it behind approval_callback without any manual guardrail config.

Filesystem and shell with security

from gantrygraph import GantryEngine
from gantrygraph.actions import FileSystemTools, ShellTools
from gantrygraph.security import WorkspacePolicy, ShellDenylist, BudgetPolicy
from langchain_anthropic import ChatAnthropic

agent = GantryEngine(
    llm=ChatAnthropic(model="claude-sonnet-4-6"),
    workspace_policy=WorkspacePolicy.restricted("/app"),   # auto-wires FS + shell
    tools=[
        ShellTools(
            workspace="/app",
            allowed_commands=["pytest", "ruff", "git"],
            denylist=ShellDenylist.strict(),               # blocks rm -rf, curl|bash, etc.
            max_output_chars=2000,                         # prevent log-file token floods
        ),
    ],
    budget=BudgetPolicy(max_steps=30, max_wall_seconds=120.0),
)
agent.run("Run the test suite, fix any lint errors, and commit the result.")

Browser agent with stealth + persistent session

Stays logged in across runs — no QR scan on every execution.

from gantrygraph import GantryEngine
from gantrygraph.actions import BrowserTools
from langchain_anthropic import ChatAnthropic

# First run: headless=False so you can complete the login once.
# All subsequent runs use headless=True — session is saved on disk.
tools = BrowserTools(
    headless=True,
    stealth=True,                                        # bot-detection bypass (default)
    profile_dir="~/.gantrygraph/profiles/whatsapp",     # persistent cookies + IndexedDB
)

agent = GantryEngine(
    llm=ChatAnthropic(model="claude-sonnet-4-6"),
    tools=[tools],
    perception_mode="axtree",   # accessibility tree instead of screenshots — 80% cheaper
    message_window=20,          # bound context growth on long runs
    enable_caching=True,        # Anthropic prompt cache
    max_steps=30,
)

result = agent.run(
    "Open WhatsApp Web, find 'Mamma', and send the Cacio e Pepe recipe."
)

Vision pipeline — annotated screenshots and precise clicks

Attach a PerceptionPipeline to WebPage and BrowserTools to transform screenshots before they reach the LLM. The SetOfMarkAnnotator draws numbered boxes over every interactive element; the agent then calls browser_click_som(element_id=N) instead of guessing pixel coordinates — 100% precision on every labelled element.

from gantrygraph import GantryEngine
from gantrygraph.actions import BrowserTools
from gantrygraph.perception import WebPage
from gantrygraph.vision import (
    PerceptionPipeline,
    SetOfMarkAnnotator,   # numbered boxes on every interactive element
    Downsample,           # cap resolution before encoding
    ConvertToWebP,        # ~30% smaller payload vs PNG
)
from langchain_anthropic import ChatAnthropic

pipeline = PerceptionPipeline([
    SetOfMarkAnnotator(),        # run first — bounding boxes at full resolution
    Downsample(max_width=1280),  # then shrink
    ConvertToWebP(quality=85),   # then compress
])

web = WebPage(url="https://example.com", vision_pipeline=pipeline)
tools = BrowserTools(web_page=web, vision_pipeline=pipeline)
# pipeline enables browser_click_som — click element #3 on the annotated screenshot

agent = GantryEngine(
    llm=ChatAnthropic(model="claude-sonnet-4-6"),
    perception=web,
    tools=[tools],
    max_steps=30,
)
agent.run("Click the 'Accept cookies' button and fill in the signup form.")

Filters are composable — subclass ImageFilter to add your own (custom crop, watermark removal, domain-specific annotation).

Connect an MCP server

from gantrygraph import GantryEngine
from gantrygraph.mcp import MCPClient
from langchain_anthropic import ChatAnthropic

async def main():
    async with MCPClient("npx -y @modelcontextprotocol/server-github") as mcp:
        agent = GantryEngine(
            llm=ChatAnthropic(model="claude-sonnet-4-6"),
            tools=[mcp],
            max_steps=20,
        )
        result = await agent.arun("Open a PR that adds a CHANGELOG entry for v1.2.0.")
        print(result)

Token cost controls

Four orthogonal knobs that together reduce token spend by 5–10× on long tasks:

from gantrygraph import GantryEngine
from gantrygraph.actions import ShellTools
from langchain_anthropic import ChatAnthropic

agent = GantryEngine(
    llm=ChatAnthropic(model="claude-sonnet-4-6"),
    tools=[ShellTools(max_output_chars=2000)],   # (1) truncate shell output
    perception_mode="axtree",                    # (2) text tree instead of screenshots
    message_window=20,                           # (3) sliding context window
    enable_caching=True,                         # (4) Anthropic prompt cache
)

Control	What it does	Savings
`max_output_chars` on `ShellTools`	Truncates stdout/stderr	Prevents 100k-token log dumps
`perception_mode="axtree"`	Text accessibility tree vs screenshot	~80% per observe step
`message_window=N`	Keeps only the last N messages	Caps O(N²) history growth
`enable_caching=True`	Anthropic prompt cache on system messages	Up to 90% input-token discount

See the Cost Optimization guide for a full breakdown.

Security

GantryGraph ships with layered, opt-in security controls:

import os
from gantrygraph import GantryEngine, gantry_tool
from gantrygraph.actions import ShellTools
from gantrygraph.security import (
    GuardrailPolicy, WorkspacePolicy, BudgetPolicy,
    ShellDenylist, GantrySecrets,
)

@gantry_tool(destructive=True)
def drop_table(table: str) -> str:
    """Drop a database table permanently."""
    ...

agent = GantryEngine(
    llm=my_llm,
    tools=[
        ShellTools(
            workspace="/app",
            denylist=ShellDenylist.strict(),   # blocks rm -rf /, dd wipe, curl|bash, etc.
        ),
        drop_table,                             # auto-requires approval
    ],
    workspace_policy=WorkspacePolicy.restricted("/app"),
    guardrail=GuardrailPolicy(requires_approval={"shell_run"}),
    budget=BudgetPolicy(max_steps=50, max_tokens=20_000, max_wall_seconds=300),
    approval_callback=lambda tool, args: input(f"Allow {tool}({args})? [y/N] ") == "y",
    secrets=GantrySecrets({"DB_PASS": os.environ["DB_PASSWORD"]}),
)

Layer	Class	What it does
Approval gate	`GuardrailPolicy`	Require human sign-off before listed tools run
Auto-approval	`@gantry_tool(destructive=True)`	Tag a tool as destructive — auto-added to the gate
Shell firewall	`ShellDenylist`	Block `rm -rf /`, fork bombs, `curl\|bash`, SSH key reads
Blind secrets	`GantrySecrets`	Keep credentials out of the LLM context window
Path sandbox	`WorkspacePolicy`	Restrict file/shell tools to allowed directories
Cost cap	`BudgetPolicy`	Hard limit on steps, tokens, and wall-clock time

Architecture

gantrygraph/
  core/         ABCs and shared types — no I/O, no side effects
  engine/       LangGraph graph wiring (observe → think → act → review)
  perception/   Desktop screenshot (mss+PIL), web accessibility (Playwright)
  actions/      Mouse/keyboard (pyautogui), browser (Playwright), filesystem, shell
  mcp/          MCP client — dynamic StructuredTool generation from any MCP server
  memory/       InMemoryStore, ChromaDB, MiniVecDb (Rust HNSW + TTL)
  security/     GuardrailPolicy, WorkspacePolicy, BudgetPolicy, ShellDenylist, GantrySecrets
  swarm/        Multi-agent supervisor pattern
  cloud/        FastAPI REST server + SSE streaming
  telemetry/    OpenTelemetry span exporter
  vision/       Screenshot pipeline (SetOfMarkAnnotator, Downsample, ConvertToWebP, Grayscale)
  tool.py       @gantry_tool decorator

The agent loop is a LangGraph StateGraph:

START → memory_recall → observe → think → act → review
                                                   │
                              ┌────────────────────┘
                              ▼
                          is_done or max_steps?
                              │yes          │no
                             END          observe

Every node is a pure async def. Callbacks (on_event, approval_callback) support both def and async def via ensure_awaitable.

Development

git clone https://github.com/GantryGraph/GantryGraph
cd gantrygraph
pip install -e ".[all,dev]"

pytest tests/unit/           # fast, no display needed
pytest tests/integration/    # needs MCP subprocess + Playwright
mypy src/gantrygraph --strict
ruff check src/ tests/
ruff format src/ tests/

See CONTRIBUTING.md for the full contributor guide.

License

MIT — see LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Ale777123

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.8.0

Jun 4, 2026

This version

0.7.0

Jun 3, 2026

0.6.0

Jun 2, 2026

0.5.0

May 7, 2026

0.3.0

May 5, 2026

0.2.0

May 6, 2026

0.1.0

May 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gantrygraph-0.7.0.tar.gz (122.3 kB view details)

Uploaded Jun 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gantrygraph-0.7.0-py3-none-any.whl (96.8 kB view details)

Uploaded Jun 3, 2026 Python 3

File details

Details for the file gantrygraph-0.7.0.tar.gz.

File metadata

Download URL: gantrygraph-0.7.0.tar.gz
Upload date: Jun 3, 2026
Size: 122.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for gantrygraph-0.7.0.tar.gz
Algorithm	Hash digest
SHA256	`867cac09e8a85735f692b65c6e87a6240b0d7bedff4c71c82d0e43ed5a3389c0`
MD5	`2ba8f2d54f735b8f4196687fad6f0081`
BLAKE2b-256	`93e0e7bbe2a2e8ed918931b3215e672060c0a1d59a2594ec62ba4b5d3d3b79ee`

See more details on using hashes here.

Provenance

The following attestation bundles were made for gantrygraph-0.7.0.tar.gz:

Publisher: publish.yml on GantryGraph/GantryGraph

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: gantrygraph-0.7.0.tar.gz
- Subject digest: 867cac09e8a85735f692b65c6e87a6240b0d7bedff4c71c82d0e43ed5a3389c0
- Sigstore transparency entry: 1710981235
- Sigstore integration time: Jun 3, 2026
Source repository:
- Permalink: GantryGraph/GantryGraph@3a80dd49b4f906792dbcb9156446cd7cec16d00c
- Branch / Tag: refs/tags/v0.7.0
- Owner: https://github.com/GantryGraph
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@3a80dd49b4f906792dbcb9156446cd7cec16d00c
- Trigger Event: release

File details

Details for the file gantrygraph-0.7.0-py3-none-any.whl.

File metadata

Download URL: gantrygraph-0.7.0-py3-none-any.whl
Upload date: Jun 3, 2026
Size: 96.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for gantrygraph-0.7.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f6b83010f3216cb3c35f8941e03409821c2e7fa9ab183a3aa1a0ab5975edf804`
MD5	`8b4f25bd6a4a69360f89c087102980dd`
BLAKE2b-256	`db04c5f311caf1e19e612641292020d7e1d286264d00c38ddedd6c4f4b859474`

See more details on using hashes here.

Provenance

The following attestation bundles were made for gantrygraph-0.7.0-py3-none-any.whl:

Publisher: publish.yml on GantryGraph/GantryGraph

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: gantrygraph-0.7.0-py3-none-any.whl
- Subject digest: f6b83010f3216cb3c35f8941e03409821c2e7fa9ab183a3aa1a0ab5975edf804
- Sigstore transparency entry: 1710981305
- Sigstore integration time: Jun 3, 2026
Source repository:
- Permalink: GantryGraph/GantryGraph@3a80dd49b4f906792dbcb9156446cd7cec16d00c
- Branch / Tag: refs/tags/v0.7.0
- Owner: https://github.com/GantryGraph
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@3a80dd49b4f906792dbcb9156446cd7cec16d00c
- Trigger Event: release

gantrygraph 0.7.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

gantrygraph

Why gantrygraph?

Install

Common patterns

Add a custom tool

Filesystem and shell with security

Browser agent with stealth + persistent session

Vision pipeline — annotated screenshots and precise clicks

Connect an MCP server

Token cost controls

Security

Architecture

Development

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance