Official API of OpenAGI Foundation (metapackage with all features)
Project description
OAGI Python SDK
Python SDK for the OAGI API - vision-based task automation.
What is OAGI?
OAGI is the Python SDK for Lux, the world's most advanced computer-use model from the OpenAGI Foundation.
Computer Use is AI's ability to operate human-facing software — not just through APIs, but by operating computers natively, just as human users do. It's a paradigm shift in what AI can do: not just generating, reasoning, or researching, but actually operating on your computer.
Lux comes in three modes, giving you control over depth, speed, and style of execution:
- Tasker — Strictly follows step-by-step instructions with ultra-stable, controllable execution
- Actor — Ideal for immediate tasks, completing actions at near-instant speed
- Thinker — Understands vague, complex goals, performing hour-long executions
Use Cases
With Lux, possibilities are endless. Here are a few examples:
- Web Scraping & Data Crawl — Navigate websites, sort results, and collect product information autonomously
- Software QA — Automate repetitive testing tasks, navigate applications, perform test actions, and validate expected behaviors
- Financial Data Extraction — Navigate to sites like NASDAQ and extract insider activity data
- Data Entry — Enter accurate data across dashboards and forms
- Workflow Automation — Chain together multi-step tasks across different applications
Table of Contents
Installation
# Recommended: All features (desktop automation + server)
pip install oagi
# Or install core only (minimal dependencies)
pip install oagi-core
# Or install with specific features
pip install oagi-core[desktop] # Desktop automation support
pip install oagi-core[server] # Server support
Requires Python >= 3.10
Installation Options
oagi(Recommended): Metapackage that includes all features (desktop + server). Equivalent tooagi-core[desktop,server].oagi-core: Core SDK with minimal dependencies (httpx, pydantic). Suitable for server deployments or custom automation setups.oagi-core[desktop]: Addspyautoguiandpillowfor desktop automation features like screenshot capture and GUI control.oagi-core[server]: Adds FastAPI and Socket.IO dependencies for running the real-time server for browser extensions.
Note: Features requiring desktop dependencies (like PILImage.from_screenshot(), PyautoguiActionHandler, ScreenshotMaker) will show helpful error messages if you try to use them without installing the desktop extra.
Quick Start
Set your API credentials:
export OAGI_API_KEY="your-api-key" # get your API key from https://developer.agiopen.org/
# export OAGI_BASE_URL="https://api.agiopen.org/", # optional, defaults to production endpoint
Automated Task Execution
Run tasks automatically with screenshot capture and action execution:
import asyncio
from oagi import AsyncDefaultAgent, AsyncPyautoguiActionHandler, AsyncScreenshotMaker
async def main():
agent = AsyncDefaultAgent(max_steps=10)
completed = await agent.execute(
"Search weather on Google",
action_handler=AsyncPyautoguiActionHandler(), # Executes mouse/keyboard actions
image_provider=AsyncScreenshotMaker(), # Captures screenshots
)
return completed
asyncio.run(main())
Configure PyAutoGUI behavior with custom settings:
from oagi import AsyncPyautoguiActionHandler, PyautoguiConfig
# Customize action behavior
config = PyautoguiConfig(
drag_duration=1.0, # Slower drags for precision (default: 0.5)
scroll_amount=50, # Larger scroll steps (default: 2 on macOS, 100 on others)
wait_duration=2.0, # Longer waits for WAIT action (default: 1.0)
action_pause=0.2, # Pause between PyAutoGUI calls (default: 0.1)
hotkey_interval=0.1, # Interval between keys in hotkey combos (default: 0.1)
capslock_mode="session", # Caps lock mode: 'session' or 'system' (default: 'session')
macos_ctrl_to_cmd=True, # Replace ctrl with cmd on macOS (default: True)
click_pre_delay=0.1, # Delay after move before click (default: 0.1)
post_batch_delay=1.0, # Delay after actions before next screenshot (default: 1.0)
)
action_handler = AsyncPyautoguiActionHandler(config=config)
Command Line Interface
Run agents directly from the terminal:
# Run with actor model
oagi agent run "Go to nasdaq.com, search for AAPL. Under More, go to Insider Activity" --model lux-actor-1
# Run with thinker mode (uses lux-thinker-1 model with more steps)
oagi agent run "Look up the store hours for the nearest Apple Store to zip code 23456 using the Apple Store Locator" --model lux-thinker-1
# Run pre-configured tasker workflows (no instruction needed)
oagi agent run --mode tasker:software_qa
# List all available modes
oagi agent modes
# Check macOS permissions (screen recording & accessibility)
oagi agent permission
# Print all available screens and their indices
oagi agent screens
# Export execution history
oagi agent run "Complete the form" --export html --export-file report.html
# Run with a specific screen
oagi agent run "Search weather on Google" --screen-index 1
CLI options:
--mode: Agent mode (default: actor). Useoagi agent modesto list available modes--model: Override the model (default: determined by mode)--max-steps: Maximum steps (default: determined by mode)--temperature: Sampling temperature (default: determined by mode)--step-delay: Delay after each action before next screenshot (default: 0.3s)--export: Export format (markdown, html, json)--export-file: Output file path for export--screen-index: Screen index for multi-screen environments
Image Processing
Process and optimize images before sending to API:
from oagi import PILImage, ImageConfig
# Load and compress an image
image = PILImage.from_file("large_screenshot.png")
config = ImageConfig(
format="JPEG",
quality=85,
width=1260,
height=700
)
compressed = image.transform(config)
Manual Control with Actor
For step-by-step control over task execution:
import asyncio
from oagi import AsyncActor, AsyncPyautoguiActionHandler, AsyncScreenshotMaker
async def main():
async with AsyncActor() as actor:
await actor.init_task("Complete the form")
image_provider = AsyncScreenshotMaker()
action_handler = AsyncPyautoguiActionHandler()
for _ in range(10):
image = await image_provider()
step = await actor.step(image)
if step.stop:
break
await action_handler(step.actions)
asyncio.run(main())
Run On System With Wayland
The SDK includes support for desktop automation on systems with Wayland display, such as Ubuntu/Debain. It leverages ydotool and flameshot for mouse/keyboard actions and screenshot capture respectively. Please install these two tools on your system in advance and ensure ydotoold server is running in the background when running the script.
Refer to ydotool and flameshot for installation instructions. Disable mouse acceleration for more precise mouse control. (In GNOME, run gsettings set org.gnome.desktop.peripherals.mouse accel-profile 'flat')
Run tasks automatically with screenshot capture and action execution:
import asyncio
from oagi import AsyncDefaultAgent, AsyncYdotoolActionHandler, AsyncScreenshotMaker
async def main():
agent = AsyncDefaultAgent(max_steps=10)
completed = await agent.execute(
"Search weather on Google",
action_handler=AsyncYdotoolActionHandler(), # Executes mouse/keyboard actions, based on 'ydotool'
image_provider=AsyncScreenshotMaker(), # Captures screenshots, based on 'flameshot'
)
return completed
asyncio.run(main())
Configure Ydotool behavior with custom settings:
from oagi import AsyncYdotoolActionHandler, YdotoolConfig
# Customize action behavior
config = YdotoolConfig(
scroll_amount=50, # Larger scroll steps (default: 20)
wait_duration=2.0, # Longer waits for WAIT action (default: 1.0)
action_pause=1.0, # Pause between Ydotool calls (default: 0.5)
capslock_mode="session", # Caps lock mode: 'session' or 'system' (default: 'session')
socket_address="/tmp/ydotool.sock", # Custom socket address (default: YDOTOOL_SOCKET env var)
post_batch_delay=1.0, # Delay after actions before next screenshot (default: 1.0)
)
action_handler = AsyncYdotoolActionHandler(config=config)
Multi-Screen Execution
When running on multi-screen environments, you can choose which screen to use for task execution. The ScreenManager class provides methods to list available screens, while the AsyncPyautoguiActionHandler and AsyncScreenshotMaker classes allow you to set the target screen for actions and screenshots. In the result of get_all_screens, the primary screen is always the first one in the list and the remaining screens are appended in the ascending order of their origin coordinates.
import asyncio
import sys
from oagi import ScreenManager
# Must be initialized before importing pyautogui to ensure correct DPI awareness in Windows
if sys.platform == "win32":
ScreenManager.enable_windows_dpi_awareness()
from oagi import (
AsyncDefaultAgent,
AsyncPyautoguiActionHandler,
AsyncScreenshotMaker,
)
def print_all_screens():
"""Print all available screens."""
screen_manager = ScreenManager()
all_screens = screen_manager.get_all_screens()
print("Available screens:")
for screen_index, screen in enumerate(all_screens):
print(f" - Index {screen_index}: {screen}")
async def main():
agent = AsyncDefaultAgent(max_steps=10)
action_handler = AsyncPyautoguiActionHandler()
image_provider = AsyncScreenshotMaker()
# Get all available screens
screen_manager = ScreenManager()
all_screens = screen_manager.get_all_screens()
# Choose a screen for task execution
screen_index = 1 # Use the second screen as example
target_screen = all_screens[screen_index]
# Set the target screen for handlers
action_handler.set_target_screen(target_screen)
image_provider.set_target_screen(target_screen)
completed = await agent.execute(
"Search weather on Google",
action_handler=action_handler,
image_provider=image_provider,
)
return completed
asyncio.run(main())
Examples
See the examples/ directory for more usage patterns:
execute_task_auto.py- Automated task execution withAsyncDefaultAgentexecute_task_manual.py- Manual step-by-step control withActormulti_screen_execution.py- Automated task execution on multi-screen environmentscontinued_session.py- Continuing tasks across sessionsscreenshot_with_config.py- Image compression and optimizationsocketio_server_basic.py- Socket.IO server examplesocketio_client_example.py- Socket.IO client implementation
Socket.IO Server (Optional)
The SDK includes an optional Socket.IO server for real-time bidirectional communication with browser extensions or custom clients.
Installation
# Install with server support
pip install oagi # Includes server features
# Or
pip install oagi-core[server] # Core + server only
Running the Server
import uvicorn
from oagi.server import create_app, ServerConfig
# Create FastAPI app with Socket.IO
app = create_app()
# Run server
uvicorn.run(app, host="0.0.0.0", port=8000)
Or use the example script:
export OAGI_API_KEY="your-api-key"
python examples/socketio_server_basic.py
Server Features
- Dynamic namespaces: Each session gets its own namespace (
/session/{session_id}) - Simplified events: Single
initevent from client with instruction - Action execution: Emit individual actions (click, type, scroll, etc.) to client
- S3 integration: Server sends presigned URLs for direct screenshot uploads
- Session management: In-memory session storage with timeout cleanup
- REST API: Health checks and session management endpoints
Client Integration
Clients connect to a session namespace and handle action events:
import socketio
sio = socketio.AsyncClient()
namespace = "/session/my_session_id"
@sio.on("request_screenshot", namespace=namespace)
async def on_screenshot(data):
# Upload screenshot to S3 using presigned URL
return {"success": True}
@sio.on("click", namespace=namespace)
async def on_click(data):
# Execute click at coordinates
return {"success": True}
await sio.connect("http://localhost:8000", namespaces=[namespace])
await sio.emit("init", {"instruction": "Click the button"}, namespace=namespace)
See examples/socketio_client_example.py for a complete implementation.
Documentation
For full Lux documentation and guides, visit the OAGI Developer Documentation.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file oagi-0.15.3.tar.gz.
File metadata
- Download URL: oagi-0.15.3.tar.gz
- Upload date:
- Size: 107.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ea1402ca14f8cc0fee0ab03113e0afd68d9002a4a016ec2e999883a112226d9
|
|
| MD5 |
8bf31f24293174110232215efc68af06
|
|
| BLAKE2b-256 |
fb4519468cfd16d6719d2d08616a0d969970ed0bebbce94921aa71edf2bc6970
|
File details
Details for the file oagi-0.15.3-py3-none-any.whl.
File metadata
- Download URL: oagi-0.15.3-py3-none-any.whl
- Upload date:
- Size: 6.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1fc9ea1d60b528e7891e9e747e2edbb4839574fc19a1c7a8aa94d529b4ed99e5
|
|
| MD5 |
2f2c1a1f9209ff251ab4e4633f6503cb
|
|
| BLAKE2b-256 |
0b3683c5b3a35f820b6215fd068f99f3e0a85ae8c35c4719f1b74a5e0ca5b14c
|