AGI SDK - tools for building and evaluating AI web agents

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

NamanGarg20

These details have not been verified by PyPI

Project links

Leaderboard

Development Status
- 4 - Beta
Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Project description

🚀 AGI SDK

📄 Paper • 📝 Blog • 🏢 AGI Inc • 🏆 Leaderboard

Build, evaluate, and level up your AI agents for the real web.

REAL benchmark demo

✨ What is AGI SDK?

AGI SDK is a toolkit for building and evaluating AI browser agents in real-world environments.

It powers REAL Bench: the first high-fidelity benchmark for AI agents navigating modern websites like Amazon, DoorDash, Airbnb, and more.

🔹 Train agents to browse and interact with real apps 🔹 Benchmark agents with robust, standardized tasks 🔹 Submit to the leaderboard and see how your agents stack up!

TL;DR: Go from “idea” to “benchmarked agent” in <60 seconds

🛠️ Installation (30 s)

# Install the SDK
pip install agisdk

# Install Playwright browser dependencies
playwright install --force

# Set your LLM API key (for evaluation)
export OPENAI_API_KEY="your-api-key"   # any supported provider key works

✅ Supports OpenAI, Anthropic, OpenRouter, and custom models!

On Apple Silicon run brew install --cask playwright first.

⏱️ 60-second Quick-Start

Here's a minimal example to get you started for benchmarking an AI agent on the REAL Bench environment:

from agisdk import REAL

harness = REAL.harness(
    model="gpt-4o",       # any LLM tag
    task_type="omnizon",  # Amazon-like store
    headless=False        # watch it click in real-time!
)

print(harness.run())      # 🎉

Need more control? See full examples ›

🔥 Features

Full-stack web replicas of top real-world apps (Amazon, Uber, Gmail, Airbnb, etc.)
Robust agent API: Observations, Actions, Memory, Errors
Leaderboard integration (REAL Bench)
Customizable harness: plug your own agents
Multi-model support: OpenAI, Anthropic, OpenRouter, or your own model
Parallel evaluation for faster experiments

Running Custom Agents

Checkout the README.md in the example folder. There are three examples of custom agents in the example directory:

example/starter.py: A simple example to get you started
example/custom.py: A more complex example with a custom agent
example/nova.py: For running custom agents which already have browsers running (in this case, Amazon NovaAct)

Additionally, there is a hackable example in example/hackable.py which is a can be configured for better performance and starting of.

Local Development

Only if you want to develop locally, you can install from source:

# Clone the repository
git clone https://github.com/agi-inc/agisdk.git
cd agisdk

# Install in development mode
pip install -e .

🌐 Available Tasks

Versioning: The SDK ships both v1 and v2 task sets; if you omit the version when selecting tasks or running experiments the harness defaults to v1. Specify task_version="v2" (or use v2.* task ids) to target the newer scenarios.

The AGI SDK includes high-fidelity, fully-deterministic websites for agents to explore. These are modern web stack sites (React + Next.js) with rich functionality for core user flows, realistic mock data, and consistent behavior for testing and evaluation.

The benchmark includes these environments:

App Clone	Task Prefix	Example Use Case
🛒 Amazon → Omnizon	`v2.omnizon-*`	Buy a laptop, find a gift
🍔 DoorDash → DashDish	`v2.dashdish-*`	Order dinner
✈️ United → FlyUnified	`v2.flyunified-*`	Book a flight
🏡 Airbnb → Staynb	`v2.staynb-*`	Reserve accommodation
📅 Google Calendar → GoCalendar	`v2.gocalendar-*`	Schedule a meeting
📬 Gmail → GoMail	`v2.gomail-*`	Compose an email
🍽️ OpenTable → OpenDining	`v2.opendining-*`	Book a restaurant
👔 LinkedIn → NetworkIn	`v2.networkin-*`	Accept a connection
🚗 Uber → Udriver	`v2.udriver-*`	Book a ride
💼 UpWork → TopWork	`v2.topwork-*`	Find a freelance gig
🏠 Zillow → Zilloft	`v2.zilloft-*`	Browse houses

Each task comes with human-written goals designed to stress-test agent capabilities.

🔑 API Keys

To use models from other providers, set their respective API keys:

# For Anthropic models (like sonnet-3.7)
export ANTHROPIC_API_KEY="your-anthropic-api-key"

👁️ Observation Structure

Your agent gets access to the following observation structure:

{
    'chat_messages': [...],          # History of chat messages
    'goal': "...",                   # Text description of the goal
    'goal_object': [...],            # Structured goal object with text and images
    'open_pages_urls': [...],        # List of open page URLs
    'active_page_index': 0,          # Index of the active page
    'url': "...",                    # Current URL
    'screenshot': np.array(...),     # Screenshot as numpy array
    'dom_object': {...},             # DOM structure
    'axtree_object': {...},          # Accessibility tree
    'extra_element_properties': {...}, # Additional element properties
    'focused_element_bid': "...",    # ID of the focused element
    'last_action': "...",            # Last action performed
    'last_action_error': "...",      # Error from last action (if any)
    'elapsed_time': 0.0,             # Time elapsed in the episode
    'browser': {...}                 # Playwright browser object (for direct control)
}

🎯 Actions

Actions are specified as strings in the format of function calls. Here are some commonly used actions:

# Navigation
"goto('https://www.google.com')"
"go_back()"
"go_forward()"

# Interaction
"click('element_id')"
"fill('input_id', 'text to enter')"
"press('Enter')"

# Communication
"send_msg_to_user('I found the answer: $42.99')"

# Reporting infeasible tasks
"report_infeasible('The requested item is out of stock')"

⚙️ Harness Configuration

The harness function accepts the following parameters:

REAL.harness(
    # Agent configuration (provide one of these)
    model="gpt-4o",                                # OpenAI models
    model="sonnet-3.7",                            # Anthropic models
    model="openrouter/deepseek/deepseek-chat-v3-0324", # OpenRouter models (with openrouter/ prefix)
    agentargs=MyAgentArgs(),                       # Or provide your own agent arguments

    # Task selection (provide one of these or don't provide any to run all tasks)
    task_name="v2.omnizon-1",  # Specific task to run
    task_type="omnizon",              # Run all tasks of this type
    task_id=1,                        # Run specific task ID within a type

    # Browser configuration
    headless=False,                   # Whether to show the browser
    max_steps=25,                     # Maximum number of steps
    browser_dimensions=(1280, 720),   # Browser window dimensions

    # Observation options
    use_html=False,                   # Include HTML in observations
    use_axtree=True,                  # Include accessibility tree
    use_screenshot=True,              # Include screenshots

    # Leaderboard submission
    leaderboard=False,                # Whether to submit to leaderboard
    run_id="my_unique_id",            # Unique ID for the submission

    # Execution options
    num_workers=4,                    # Number of parallel workers
    use_cache=True,                   # Use cached results when available
    cache_only=False,                 # Only use cached results
    force_refresh=False,              # Force re-running tasks

    # Output options
    results_dir="./results"           # Where to store results
)

🏆 Submitting to the REAL Leaderboard

Create an API key – use the leaderboard portal (Account → API Keys) to generate a key tied to your Supabase user.
Mint a run ID
- From the portal UI: open the Profile page, click Create Run, pick your model, and copy the run_id that appears in the runs table.
- From the API (same endpoint the SDK uses):
```
curl "https://www.realevals.ai/api/runKey?api_key=<API_KEY>&model_name=<MODEL_NAME>&run_name=<RUN_NAME>"
```
The JSON response returns newRunId. If want to use a different domain, set REAL_API_BASE=https://… before running the SDK to override the default domain.

Run the harness in leaderboard mode:

harness = REAL.harness(
    model="gpt-4o",
    task_type="omnizon",
    leaderboard=True,
    api_key="<API_KEY>",
    run_name="<RUN_NAME>",
    model_id_name="<MODEL_NAME>",
    run_id="<newRunId>",
)
harness.run()

The harness sets RUNID so each clone posts results to the REAL API. Use force_refresh=True or delete cached runs in example/results/ when you need a fresh submission.

Inspect the submission – either open the leaderboard UI or call
```
https://web-eval-leaderboard.vercel.app/api/getRunTask?api_key=<API_KEY>&display_name=<RUN_NAME>&task_id=<TASK_ID>
```
to fetch stored results (use bare task IDs such as omnizon-1; inside the SDK you reference tasks with the v2. prefix).

🤝 Contributing

We welcome contributions of all kinds:

📢 Feature requests? Open an Issue
🐛 Bug reports? Create a ticket
📈 Improve REAL tasks? Join our Project Board
🛠️ Submit code? Fork + PR - we love clean commits!

Let's build the future of agents together. 🔥

💬 Community

Join our Discord (coming soon!)
Follow AGI Inc. on LinkedIn

⭐️ Why AGI SDK?

Because your agents deserve better than toy environments.
Because the real web is messy and that's where the magic happens.
Because the future is agentic and it starts here.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

NamanGarg20

These details have not been verified by PyPI

Project links

Leaderboard

Development Status
- 4 - Beta
Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

This version

0.3.5

Nov 3, 2025

0.2.9

Sep 20, 2025

0.2.6

Sep 16, 2025

0.2.5

Jul 30, 2025

0.2.4

Jul 18, 2025

0.2.3

Jul 17, 2025

0.2.2

Jul 17, 2025

0.2.1

Jul 11, 2025

0.2.0

Jun 13, 2025

0.1.23

Jun 12, 2025

0.1.22

May 30, 2025

0.1.21

May 16, 2025

0.1.19

May 13, 2025

0.1.18

May 9, 2025

0.1.17

Apr 29, 2025

0.1.16

Apr 29, 2025

0.1.15

Apr 27, 2025

0.1.14

Apr 25, 2025

0.1.13

Apr 25, 2025

0.1.12

Apr 24, 2025

0.1.11

Apr 23, 2025

0.1.10

Apr 19, 2025

0.1.9

Apr 18, 2025

0.1.8

Apr 18, 2025

0.1.7

Apr 18, 2025

0.1.5

Apr 18, 2025

0.1.4

Apr 18, 2025

0.1.1

Apr 18, 2025

0.1.0

Apr 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agisdk-0.3.5.tar.gz (365.8 kB view details)

Uploaded Nov 3, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agisdk-0.3.5-py3-none-any.whl (566.4 kB view details)

Uploaded Nov 3, 2025 Python 3

File details

Details for the file agisdk-0.3.5.tar.gz.

File metadata

Download URL: agisdk-0.3.5.tar.gz
Upload date: Nov 3, 2025
Size: 365.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for agisdk-0.3.5.tar.gz
Algorithm	Hash digest
SHA256	`d501969dde7c1b060bd19fe83fc67d206f5c194741f953905cccfbdc48794d48`
MD5	`bb9f5c03f2590f87e2c5a6d7485a325e`
BLAKE2b-256	`c7ffdd2feb64ab6d0a1a15b1ea1a08a40c5a3c08ba2878fb8b9dba000201ef88`

See more details on using hashes here.

Provenance

The following attestation bundles were made for agisdk-0.3.5.tar.gz:

Publisher: python-publish.yml on agi-inc/agisdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: agisdk-0.3.5.tar.gz
- Subject digest: d501969dde7c1b060bd19fe83fc67d206f5c194741f953905cccfbdc48794d48
- Sigstore transparency entry: 661850343
- Sigstore integration time: Nov 3, 2025
Source repository:
- Permalink: agi-inc/agisdk@ec25c1a3002954eb457c0c2d06d8b7b8f3acf52f
- Branch / Tag: refs/tags/0.3.5
- Owner: https://github.com/agi-inc
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@ec25c1a3002954eb457c0c2d06d8b7b8f3acf52f
- Trigger Event: release

File details

Details for the file agisdk-0.3.5-py3-none-any.whl.

File metadata

Download URL: agisdk-0.3.5-py3-none-any.whl
Upload date: Nov 3, 2025
Size: 566.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for agisdk-0.3.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`168596cf1fd3c8d8bef1336d19838eb5beb95905857450fa37bce24e1b7e0ccb`
MD5	`57218833437daf95baef45663070d4f8`
BLAKE2b-256	`5b053974e5586c733ada18c926267b08aad5efe12d78a17468866bb8f1237fec`

See more details on using hashes here.

Provenance

The following attestation bundles were made for agisdk-0.3.5-py3-none-any.whl:

Publisher: python-publish.yml on agi-inc/agisdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: agisdk-0.3.5-py3-none-any.whl
- Subject digest: 168596cf1fd3c8d8bef1336d19838eb5beb95905857450fa37bce24e1b7e0ccb
- Sigstore transparency entry: 661850353
- Sigstore integration time: Nov 3, 2025
Source repository:
- Permalink: agi-inc/agisdk@ec25c1a3002954eb457c0c2d06d8b7b8f3acf52f
- Branch / Tag: refs/tags/0.3.5
- Owner: https://github.com/agi-inc
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@ec25c1a3002954eb457c0c2d06d8b7b8f3acf52f
- Trigger Event: release

agisdk 0.3.5

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🚀 AGI SDK

✨ What is AGI SDK?

🛠️ Installation (30 s)

⏱️ 60-second Quick-Start

🔥 Features

Running Custom Agents

Local Development

🌐 Available Tasks

🔑 API Keys

👁️ Observation Structure

🎯 Actions

⚙️ Harness Configuration

🏆 Submitting to the REAL Leaderboard

🤝 Contributing

💬 Community

⭐️ Why AGI SDK?

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance