Skip to main content

Official API of OpenAGI Foundation

Project description

OAGI Python SDK

Python SDK for the OAGI API - vision-based task automation.

Installation

# Recommended: All features (desktop automation + server)
pip install oagi

# Or install core only (minimal dependencies)
pip install oagi-core

# Or install with specific features
pip install oagi-core[desktop]  # Desktop automation support
pip install oagi-core[server]   # Server support

Requires Python >= 3.10

Installation Options

  • oagi (Recommended): Metapackage that includes all features (desktop + server). Equivalent to oagi-core[desktop,server].
  • oagi-core: Core SDK with minimal dependencies (httpx, pydantic). Suitable for server deployments or custom automation setups.
  • oagi-core[desktop]: Adds pyautogui and pillow for desktop automation features like screenshot capture and GUI control.
  • oagi-core[server]: Adds FastAPI and Socket.IO dependencies for running the real-time server for browser extensions.

Note: Features requiring desktop dependencies (like PILImage.from_screenshot(), PyautoguiActionHandler, ScreenshotMaker) will show helpful error messages if you try to use them without installing the desktop extra.

Quick Start

Set your API credentials:

export OAGI_API_KEY="your-api-key"
export OAGI_BASE_URL="https://api.oagi.com"  # or your server URL

Single-Step Analysis

Analyze a screenshot and get recommended actions:

from oagi import single_step

step = single_step(
    task_description="Click the submit button",
    screenshot="screenshot.png"  # or bytes, or Image object
)

print(f"Actions: {step.actions}")
print(f"Complete: {step.is_complete}")

Automated Task Execution

Run tasks automatically with screenshot capture and action execution:

from oagi import ShortTask, ScreenshotMaker, PyautoguiActionHandler

task = ShortTask()
completed = task.auto_mode(
    "Search weather on Google",
    max_steps=10,
    executor=PyautoguiActionHandler(),  # Executes mouse/keyboard actions
    image_provider=ScreenshotMaker(),    # Captures screenshots
)

Configure PyAutoGUI behavior with custom settings:

from oagi import PyautoguiActionHandler, PyautoguiConfig

# Customize action behavior
config = PyautoguiConfig(
    drag_duration=1.0,      # Slower drags for precision (default: 0.5)
    scroll_amount=50,       # Larger scroll steps (default: 30)
    wait_duration=2.0,      # Longer waits (default: 1.0)
    action_pause=0.2,       # More pause between actions (default: 0.1)
    hotkey_interval=0.1,    # Interval between keys in hotkey combinations (default: 0.1)
    capslock_mode="session" # Caps lock mode: 'session' or 'system' (default: 'session')
)

executor = PyautoguiActionHandler(config=config)
task.auto_mode("Complete form", executor=executor, image_provider=ScreenshotMaker())

Image Processing

Process and optimize images before sending to API:

from oagi import PILImage, ImageConfig

# Load and compress an image
image = PILImage.from_file("large_screenshot.png")
config = ImageConfig(
    format="JPEG",
    quality=85,
    width=1260,
    height=700
)
compressed = image.transform(config)

# Use with single_step
step = single_step("Click button", screenshot=compressed)

Async Support

Use async client for non-blocking operations and better concurrency:

import asyncio
from oagi import async_single_step, AsyncShortTask

async def main():
    # Single-step async analysis
    step = await async_single_step(
        "Find the search bar",
        screenshot="screenshot.png"
    )
    print(f"Found {len(step.actions)} actions")
    
    # Async task automation
    task = AsyncShortTask()
    async with task:
        await task.init_task("Complete the form")
        # ... continue with async operations

asyncio.run(main())

Examples

See the examples/ directory for more usage patterns:

  • google_weather.py - Basic task execution with ShortTask
  • single_step.py - Basic single-step inference
  • screenshot_with_config.py - Image compression and optimization
  • execute_task_auto.py - Automated task execution
  • socketio_server_basic.py - Socket.IO server example
  • socketio_client_example.py - Socket.IO client implementation

Socket.IO Server (Optional)

The SDK includes an optional Socket.IO server for real-time bidirectional communication with browser extensions or custom clients.

Installation

# Install with server support
pip install oagi  # Includes server features
# Or
pip install oagi-core[server]  # Core + server only

Running the Server

import uvicorn
from oagi.server import create_app, ServerConfig

# Create FastAPI app with Socket.IO
app = create_app()

# Run server
uvicorn.run(app, host="0.0.0.0", port=8000)

Or use the example script:

export OAGI_API_KEY="your-api-key"
python examples/socketio_server_basic.py

Server Features

  • Dynamic namespaces: Each session gets its own namespace (/session/{session_id})
  • Simplified events: Single init event from client with instruction
  • Action execution: Emit individual actions (click, type, scroll, etc.) to client
  • S3 integration: Server sends presigned URLs for direct screenshot uploads
  • Session management: In-memory session storage with timeout cleanup
  • REST API: Health checks and session management endpoints

Client Integration

Clients connect to a session namespace and handle action events:

import socketio

sio = socketio.AsyncClient()
namespace = "/session/my_session_id"

@sio.on("request_screenshot", namespace=namespace)
async def on_screenshot(data):
    # Upload screenshot to S3 using presigned URL
    return {"success": True}

@sio.on("click", namespace=namespace)
async def on_click(data):
    # Execute click at coordinates
    return {"success": True}

await sio.connect("http://localhost:8000", namespaces=[namespace])
await sio.emit("init", {"instruction": "Click the button"}, namespace=namespace)

See examples/socketio_client_example.py for a complete implementation.

Documentation

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oagi_core-0.9.0.tar.gz (165.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oagi_core-0.9.0-py3-none-any.whl (73.0 kB view details)

Uploaded Python 3

File details

Details for the file oagi_core-0.9.0.tar.gz.

File metadata

  • Download URL: oagi_core-0.9.0.tar.gz
  • Upload date:
  • Size: 165.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for oagi_core-0.9.0.tar.gz
Algorithm Hash digest
SHA256 89dd7bfa36462c85ae3815d529bf5c836ef67165ccc6fc5df2f2a14f8de658df
MD5 470ccb30d58e1260cd9aeaaf1249835b
BLAKE2b-256 8656996834f9392c4394fb2ffc46b3637cf01109a3b1a54054484115f6c67f2e

See more details on using hashes here.

File details

Details for the file oagi_core-0.9.0-py3-none-any.whl.

File metadata

  • Download URL: oagi_core-0.9.0-py3-none-any.whl
  • Upload date:
  • Size: 73.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for oagi_core-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b8ae9641ddfce20e8ff414e5a10964775fa3d1b34b6d8567f4e4d08db094fcc2
MD5 7fbef787f28176de283023213bb0a28a
BLAKE2b-256 67cc4b3691ff158fae82d779d23b34c563e6e1de074387f6707dbda6d4462ab4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page