浏览器代理服务器
Project description
Orion Sandbox
Orion是一个提供终端模拟器和浏览器代理功能的服务器。它可以作为独立服务运行,也可以作为Python包导入使用。
功能特点
安装
Orion Sandbox (this repo) is a container-based environment that provides a secure, isolated space for AI agents (particularly LLMs like Claude) to interact with terminal environments and web browsers. It acts as a bridge between the AI system and computing resources, allowing the AI to execute real-world tasks like:
- Running terminal commands
- Automating browser actions
- Managing files and directories
- Editing text files
This sandbox creates a controlled environment where AI systems can safely perform actions without having direct access to the host system.
Architecture
┌───────────────────────────┐ ┌─────────────────┐ ┌────────────────────────────────────────────┐
│ │ │ │ │ Sandbox Container │
│ AI Agent (e.g. Claude) │ │ API Proxy │ │ │
│ │ │ │ │ ┌──────────┐ ┌─────────┐ ┌────────────┐ │
│ Orion │ API Requests │ - Auth check │ │ │ │ │ │ │ │ │
│ │◄──────────────►│ - Rate limiting├─────►│ │ Terminal │ │ Browser │ │ File/Text │ │
│ │ & Responses │ - Routing │ │ │ Service │ │ Service │ │ Operations │ │
│ │ │ │ │ │ │ │ │ │ │ │
│ │ │ │ │ └────┬─────┘ └────┬────┘ └─────┬──────┘ │
└───────────────────────────┘ └─────────────────┘ │ │ │ │ │
x-sandbox-token │ │ │ │ │
authentication │ v v v │
│ ┌──────────────────────────────────────┐ │
│ │ FastAPI │ │
│ │ (app/server.py + router.py) │ │
│ └──────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────┘
Key Components
-
AI Agent: The LLM (e.g., Claude) that sends API requests to the sandbox to perform tasks.
-
API Proxy: An intermediary service (
https://api.manus.im/apiproxy.v1.ApiProxyService/CallApi) that:- Authenticates requests using the
x-sandbox-tokenheader - Routes requests to the appropriate sandbox instance
- Handles rate limiting and access control
- Authenticates requests using the
-
Sandbox Container: A Docker container that isolates the execution environment and provides:
- FastAPI server (
app/server.py) - The main entry point for HTTP requests - WebSocket server (
app/terminal_socket_server.py) - For real-time terminal interaction - File and text editing capabilities (
app/tools/text_editor.py)
- FastAPI server (
-
browser_use Library: A modified version of the browser-use library that:
- Provides browser automation via Playwright
- Has been specifically adapted to work with Claude API (via
browser_use/agent/service.py) - Handles browser actions, DOM interactions, and browser session management
browser_use Integration
The browser_use library is a key component of Orion Sandbox that enables browser automation. It provides a clean API for the AI to interact with web browsers programmatically.
It is MIT licensced although the liscence was missing from the original source code.
Key Classes and Components:
Agent Class (browser_use/agent/service.py)
The Agent class is the main entry point for browser automation. It handles:
- Initializing browser sessions
- Processing LLM outputs into actions
- Managing state history
- Handling errors and retries
class Agent:
def __init__(
self,
task: str,
llm: BaseChatModel,
browser: Browser | None = None,
# Many other parameters...
):
# Initialize all components
async def run(self, max_steps: int = 100) -> AgentHistoryList:
# Main execution loop
# Process LLM outputs and execute actions
Browser Context (browser_use/browser/context.py)
The BrowserContext class manages the browser state and provides methods for interacting with web pages:
class BrowserContext:
async def navigate_to(self, url: str):
"""Navigate to a URL"""
async def click_element(self, index: int):
"""Click an element using its index"""
async def input_text_to_element(self, index: int, text: str, delay: float = 0):
"""Input text into an element"""
System Prompts (browser_use/agent/prompts.py)
The SystemPrompt class defines the instructions given to the LLM about how to interact with the browser:
class SystemPrompt:
def important_rules(self) -> str:
"""
Returns the important rules for the agent.
"""
rules = """
1. RESPONSE FORMAT: You must ALWAYS respond with valid JSON in this exact format:
{
"current_state": {
"page_summary": "Quick detailed summary of new information from the current page which is not yet in the task history memory. Be specific with details which are important for the task. This is not on the meta level, but should be facts. If all the information is already in the task history memory, leave this empty.",
"evaluation_previous_goal": "Success|Failed|Unknown - Analyze the current elements and the image to check if the previous goals/actions are successful like intended by the task. Ignore the action result. The website is the ground truth. Also mention if something unexpected happened like new suggestions in an input field. Shortly state why/why not",
"memory": "Description of what has been done and what you need to remember. Be very specific. Count here ALWAYS how many times you have done something and how many remain. E.g. 0 out of 10 websites analyzed. Continue with abc and xyz",
"next_goal": "What needs to be done with the next actions"
},
"action": [
{
"one_action_name": {
// action-specific parameter
}
},
// ... more actions in sequence
]
}
"""
# More rules follow...
return rules
The prompt instructs the LLM on:
- How to format its responses (JSON structure)
- Rules for interacting with browser elements
- Navigation and error handling
- Task completion criteria
- Element interaction guidelines
Controller Registry (browser_use/controller/registry/service.py)
The Registry class provides a way to register and execute actions:
class Registry:
def action(
self,
description: str,
param_model: Optional[Type[BaseModel]] = None,
):
"""Decorator for registering actions"""
async def execute_action(
self,
action_name: str,
params: dict,
browser: Optional[BrowserContext] = None,
# Other parameters
) -> Any:
"""Execute a registered action"""
How AI-Sandbox Communication Works
The communication between an AI agent (like Claude) and the sandbox follows this flow:
-
AI Agent Formulates a Request:
- The AI decides on an action to perform (e.g., run a terminal command, navigate a browser)
- It constructs an appropriate API request following the sandbox API specification
-
Request Transmission:
- The AI sends an HTTP request to either:
- Directly to the sandbox container (if exposed)
- Through an API proxy service (
https://api.manus.im/apiproxy.v1.ApiProxyService/CallApi)
- The AI sends an HTTP request to either:
-
Authentication:
- The request includes an API token (
x-sandbox-tokenheader) - The token is verified against the value stored in
$HOME/.secrets/sandbox_api_token
- The request includes an API token (
-
Request Processing:
- The sandbox FastAPI server receives and processes the request
- It routes the request to the appropriate service (terminal, browser, file operations)
- The requested action is performed within the isolated container environment
-
Response Return:
- Results of the action are formatted as JSON or binary data (for file downloads)
- The response is sent back to the AI agent
-
Real-time Communication (for terminal):
- Terminal sessions use WebSockets for bidirectional, real-time communication
- The AI can receive terminal output as it's generated and send new commands
Example Flow: AI Running a Shell Command
┌─────────────┐ ┌───────────────┐ ┌──────────────────┐
│ │ 1. HTTP Request │ │ 2. Route to │ │
│ AI Agent │────────────────►│ Sandbox API │─────────────►│ Terminal Service │
│ │ │ (FastAPI) │ │ │
│ │◄────────────────│ │◄─────────────│ │
└─────────────┘ 4. JSON Response└───────────────┘ 3. Execute └──────────────────┘
Command
API Client Usage
The sandbox includes a Python API client (data_api.py) that communicates with the proxy service:
from data_api import ApiClient
# Initialize the client
api_client = ApiClient()
# Call a terminal command
response = api_client.call_api(
"terminal_execute",
body={
"command": "ls -la",
"terminal_id": "main"
}
)
print(response)
LLM Response Format for Browser Automation
When interacting with browser_use, the LLM (like Claude) must format its responses as JSON according to the schema defined in the system prompt:
{
"current_state": {
"page_summary": "Found search page with 10 results for 'electric cars'",
"evaluation_previous_goal": "Success - successfully navigated to search page and performed search as intended",
"memory": "Completed search for 'electric cars'. Need to extract information from first 3 results (0 of 3 done)",
"next_goal": "Extract detailed information from first search result"
},
"action": [
{
"click_element": {
"index": 12
}
}
]
}
This response structure allows the Agent to:
- Track the LLM's understanding of the current page
- Evaluate the success of previous actions
- Maintain memory across interactions
- Execute the next action(s)
Available Browser Actions
The browser_use library provides a wide range of actions for web automation:
Navigation Actions
go_to_url: Navigate to a specific URLsearch_google: Perform a Google searchgo_back: Navigate back in browser historyopen_tab: Open a new browser tabswitch_tab: Switch between browser tabs
Element Interaction
click_element: Click on a page element by its indexinput_text: Type text into a form fieldscroll_down/scroll_up: Scroll the pagescroll_to_text: Scroll to find specific textselect_dropdown_option: Select from dropdown menus
Content Extraction
extract_content: Extract and process page contentget_dropdown_options: Get all options from a dropdown
Task Completion
done: Mark the task as complete and return results
Integration with LLM Systems
To integrate an LLM with this sandbox:
-
API Client Implementation: Create an API client in the LLM's execution environment
-
Task Planning: The LLM should break down user requests into specific API calls
-
Sequential Operations: Complex tasks often require multiple API calls in sequence
-
Error Handling: The LLM should interpret error responses and adjust its approach
-
State Management: For multi-step operations, the LLM needs to track the state of the environment
Example Workflow: LLM Using the Sandbox
-
User asks the LLM to "Create a Python script that fetches weather data and save it"
-
LLM plans the steps:
- Create a new Python file
- Write the code to fetch weather data
- Save the file
- Run the script to test it
- Show the results to the user
-
LLM executes each step by making API calls to the sandbox:
POST /text_editorwithcommand: "create"to create a new filePOST /text_editorwithcommand: "write"to write the codePOST /terminal/{id}/writeto run the scriptGET /terminal/{id}to get the output- Return the results to the user
Security Considerations
-
Multi-layered Authentication:
- API token authentication using the
x-sandbox-tokenheader (NOT IMPLEMENTED IN THIS CODE) - Token verification happens at the proxy layer before requests reach the FastAPI application (NOT IMPLEMENTED IN THIS CODE)
- Tokens are stored securely in
$HOME/.secrets/sandbox_api_token
- API token authentication using the
-
Proxy Service Protection:
- The proxy service provides an additional layer of security
- Acts as a gatekeeper for all requests to the sandbox
- Can implement rate limiting, request validation, and access control
-
Isolation:
- The Docker container provides isolation from the host system
- Prevents the AI from affecting the host machine directly
-
Resource Limitations:
- The sandbox can be configured with resource constraints (CPU, memory) at the Docker level
- Prevents resource exhaustion attacks
-
Action Restrictions:
- The API can be configured to restrict certain dangerous operations
- Browser automation is contained within the sandbox environment
Deployment with Docker
The sandbox is designed to run in a Docker container. The provided Dockerfile was not in the original code but gives an idea of what the container could look like:
- A Python 3.12 environment
- Chromium browser for web automation
- All necessary dependencies
- API token initialization
To build and run the container:
# Build the container
docker build -t orion-sandbox .
# Run the container
docker run -p 8080:8080 orion-sandbox
方法2: 从源码安装(推荐用于开发)
git clone https://github.com/yourusername/orion.git
cd orion
pip install -e .
使用方法
作为服务使用
方法1: 使用命令行工具
安装后,可以直接使用命令行工具启动服务:
# 使用默认配置启动
orion-server
# 指定端口和日志级别
orion-server --port 8888 --log-level debug
# 开发模式(自动重载)
orion-server --reload
方法2: 使用Python脚本启动
python start_server.py --port 8330
作为库导入使用
可以将Orion作为Python库导入使用,示例代码:
import asyncio
from app import BrowserManager, terminal_manager, text_editor
# 初始化浏览器管理器
async def browser_example():
browser = BrowserManager(headless=False)
await browser.initialize()
# 执行浏览器操作...
await browser.close()
# 使用终端管理器
async def terminal_example():
terminal = await terminal_manager.create_or_get_terminal("my_terminal")
await terminal.execute_command("ls -la")
history = terminal.get_history(True, True)
# 处理终端输出...
# 运行示例
asyncio.run(browser_example())
更多示例请参考 examples/use_as_package.py。
Docker部署
# 构建容器
docker build -t orion-server .
# 运行容器
docker run -p 8330:8330 orion-server
API文档
启动服务后,访问 http://localhost:8330/docs 查看API文档。
许可证
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file orion_browser-0.2.1.tar.gz.
File metadata
- Download URL: orion_browser-0.2.1.tar.gz
- Upload date:
- Size: 362.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7fac59225bc4dae4fb7aea47dd942b4446edd96c9b8de212563d5e658ae8d4b8
|
|
| MD5 |
ff9cdc1b138bf0dbb3c7242a29945d4d
|
|
| BLAKE2b-256 |
2751129ecba3369584f70b4bdaa84fefbf356ede7bd15ea2de6d46d59b8fda96
|
File details
Details for the file orion_browser-0.2.1-py3-none-any.whl.
File metadata
- Download URL: orion_browser-0.2.1-py3-none-any.whl
- Upload date:
- Size: 391.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3bbbeaba607d2b1f04963c85e72c5bb574dd8d5a5c43870752842224b8372c73
|
|
| MD5 |
7572244f9c60ed94b94755e211480600
|
|
| BLAKE2b-256 |
cdaec258e96a676f0382a302d950e74cca79a8245265a9bab8fbba5863024e13
|