浏览器代理服务器

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Orion Sandbox

Orion是一个提供终端模拟器和浏览器代理功能的服务器。它可以作为独立服务运行，也可以作为Python包导入使用。

功能特点

https://gist.github.com/jlia0/db0a9695b3ca7609c9b1a08dcbf872c9

安装

Orion Sandbox (this repo) is a container-based environment that provides a secure, isolated space for AI agents (particularly LLMs like Claude) to interact with terminal environments and web browsers. It acts as a bridge between the AI system and computing resources, allowing the AI to execute real-world tasks like:

Running terminal commands
Automating browser actions
Managing files and directories
Editing text files

This sandbox creates a controlled environment where AI systems can safely perform actions without having direct access to the host system.

Architecture

┌───────────────────────────┐                ┌─────────────────┐      ┌────────────────────────────────────────────┐
│                           │                │                 │      │              Sandbox Container             │
│    AI Agent (e.g. Claude) │                │  API Proxy      │      │                                            │
│                           │                │                 │      │ ┌──────────┐  ┌─────────┐  ┌────────────┐  │
│         Orion             │  API Requests  │  - Auth check   │      │ │          │  │         │  │            │  │
│                           │◄──────────────►│  - Rate limiting├─────►│ │ Terminal │  │ Browser │  │ File/Text  │  │
│                           │  & Responses   │  - Routing      │      │ │ Service  │  │ Service │  │ Operations │  │
│                           │                │                 │      │ │          │  │         │  │            │  │
│                           │                │                 │      │ └────┬─────┘  └────┬────┘  └─────┬──────┘  │
└───────────────────────────┘                └─────────────────┘      │      │             │             │         │
                                             x-sandbox-token          │      │             │             │         │
                                             authentication           │      v             v             v         │
                                                                      │ ┌──────────────────────────────────────┐   │
                                                                      │ │               FastAPI                │   │
                                                                      │ │      (app/server.py + router.py)     │   │
                                                                      │ └──────────────────────────────────────┘   │
                                                                      │                                            │
                                                                      └────────────────────────────────────────────┘

Key Components

AI Agent: The LLM (e.g., Claude) that sends API requests to the sandbox to perform tasks.
API Proxy: An intermediary service (https://api.manus.im/apiproxy.v1.ApiProxyService/CallApi) that:
- Authenticates requests using the x-sandbox-token header
- Routes requests to the appropriate sandbox instance
- Handles rate limiting and access control
Sandbox Container: A Docker container that isolates the execution environment and provides:
- FastAPI server (app/server.py) - The main entry point for HTTP requests
- WebSocket server (app/terminal_socket_server.py) - For real-time terminal interaction
- File and text editing capabilities (app/tools/text_editor.py)
browser_use Library: A modified version of the browser-use library that:
- Provides browser automation via Playwright
- Has been specifically adapted to work with Claude API (via browser_use/agent/service.py)
- Handles browser actions, DOM interactions, and browser session management

browser_use Integration

The browser_use library is a key component of Orion Sandbox that enables browser automation. It provides a clean API for the AI to interact with web browsers programmatically.

It is MIT licensced although the liscence was missing from the original source code.

Key Classes and Components:

Agent Class (browser_use/agent/service.py)

The Agent class is the main entry point for browser automation. It handles:

Initializing browser sessions
Processing LLM outputs into actions
Managing state history
Handling errors and retries

class Agent:
    def __init__(
        self,
        task: str,
        llm: BaseChatModel,
        browser: Browser | None = None,
        # Many other parameters...
    ):
        # Initialize all components
        
    async def run(self, max_steps: int = 100) -> AgentHistoryList:
        # Main execution loop
        # Process LLM outputs and execute actions

Browser Context (browser_use/browser/context.py)

The BrowserContext class manages the browser state and provides methods for interacting with web pages:

class BrowserContext:
    async def navigate_to(self, url: str):
        """Navigate to a URL"""
        
    async def click_element(self, index: int):
        """Click an element using its index"""
        
    async def input_text_to_element(self, index: int, text: str, delay: float = 0):
        """Input text into an element"""

System Prompts (browser_use/agent/prompts.py)

The SystemPrompt class defines the instructions given to the LLM about how to interact with the browser:

class SystemPrompt:
    def important_rules(self) -> str:
        """
        Returns the important rules for the agent.
        """
        rules = """
1. RESPONSE FORMAT: You must ALWAYS respond with valid JSON in this exact format:
   {
     "current_state": {
        "page_summary": "Quick detailed summary of new information from the current page which is not yet in the task history memory. Be specific with details which are important for the task. This is not on the meta level, but should be facts. If all the information is already in the task history memory, leave this empty.",
        "evaluation_previous_goal": "Success|Failed|Unknown - Analyze the current elements and the image to check if the previous goals/actions are successful like intended by the task. Ignore the action result. The website is the ground truth. Also mention if something unexpected happened like new suggestions in an input field. Shortly state why/why not",
       "memory": "Description of what has been done and what you need to remember. Be very specific. Count here ALWAYS how many times you have done something and how many remain. E.g. 0 out of 10 websites analyzed. Continue with abc and xyz",
       "next_goal": "What needs to be done with the next actions"
     },
     "action": [
       {
         "one_action_name": {
           // action-specific parameter
         }
       },
       // ... more actions in sequence
     ]
   }
        """
        # More rules follow...
        return rules

The prompt instructs the LLM on:

How to format its responses (JSON structure)
Rules for interacting with browser elements
Navigation and error handling
Task completion criteria
Element interaction guidelines

Controller Registry (browser_use/controller/registry/service.py)

The Registry class provides a way to register and execute actions:

class Registry:
    def action(
        self,
        description: str,
        param_model: Optional[Type[BaseModel]] = None,
    ):
        """Decorator for registering actions"""
        
    async def execute_action(
        self,
        action_name: str,
        params: dict,
        browser: Optional[BrowserContext] = None,
        # Other parameters
    ) -> Any:
        """Execute a registered action"""

How AI-Sandbox Communication Works

The communication between an AI agent (like Claude) and the sandbox follows this flow:

AI Agent Formulates a Request:
- The AI decides on an action to perform (e.g., run a terminal command, navigate a browser)
- It constructs an appropriate API request following the sandbox API specification
Request Transmission:
- The AI sends an HTTP request to either:
  - Directly to the sandbox container (if exposed)
  - Through an API proxy service (https://api.manus.im/apiproxy.v1.ApiProxyService/CallApi)
Authentication:
- The request includes an API token (x-sandbox-token header)
- The token is verified against the value stored in $HOME/.secrets/sandbox_api_token
Request Processing:
- The sandbox FastAPI server receives and processes the request
- It routes the request to the appropriate service (terminal, browser, file operations)
- The requested action is performed within the isolated container environment
Response Return:
- Results of the action are formatted as JSON or binary data (for file downloads)
- The response is sent back to the AI agent
Real-time Communication (for terminal):
- Terminal sessions use WebSockets for bidirectional, real-time communication
- The AI can receive terminal output as it's generated and send new commands

Example Flow: AI Running a Shell Command

┌─────────────┐                 ┌───────────────┐              ┌──────────────────┐
│             │ 1. HTTP Request │               │ 2. Route to  │                  │
│  AI Agent   │────────────────►│ Sandbox API   │─────────────►│ Terminal Service │
│             │                 │ (FastAPI)     │              │                  │
│             │◄────────────────│               │◄─────────────│                  │
└─────────────┘ 4. JSON Response└───────────────┘ 3. Execute   └──────────────────┘
                                                    Command

API Client Usage

The sandbox includes a Python API client (data_api.py) that communicates with the proxy service:

from data_api import ApiClient

# Initialize the client
api_client = ApiClient()

# Call a terminal command
response = api_client.call_api(
    "terminal_execute",
    body={
        "command": "ls -la",
        "terminal_id": "main"
    }
)

print(response)

LLM Response Format for Browser Automation

When interacting with browser_use, the LLM (like Claude) must format its responses as JSON according to the schema defined in the system prompt:

{
  "current_state": {
    "page_summary": "Found search page with 10 results for 'electric cars'",
    "evaluation_previous_goal": "Success - successfully navigated to search page and performed search as intended",
    "memory": "Completed search for 'electric cars'. Need to extract information from first 3 results (0 of 3 done)",
    "next_goal": "Extract detailed information from first search result"
  },
  "action": [
    {
      "click_element": {
        "index": 12
      }
    }
  ]
}

This response structure allows the Agent to:

Track the LLM's understanding of the current page
Evaluate the success of previous actions
Maintain memory across interactions
Execute the next action(s)

Available Browser Actions

The browser_use library provides a wide range of actions for web automation:

Navigation Actions

go_to_url: Navigate to a specific URL
search_google: Perform a Google search
go_back: Navigate back in browser history
open_tab: Open a new browser tab
switch_tab: Switch between browser tabs

Element Interaction

click_element: Click on a page element by its index
input_text: Type text into a form field
scroll_down/scroll_up: Scroll the page
scroll_to_text: Scroll to find specific text
select_dropdown_option: Select from dropdown menus

Content Extraction

extract_content: Extract and process page content
get_dropdown_options: Get all options from a dropdown

Task Completion

done: Mark the task as complete and return results

Integration with LLM Systems

To integrate an LLM with this sandbox:

API Client Implementation: Create an API client in the LLM's execution environment
Task Planning: The LLM should break down user requests into specific API calls
Sequential Operations: Complex tasks often require multiple API calls in sequence
Error Handling: The LLM should interpret error responses and adjust its approach
State Management: For multi-step operations, the LLM needs to track the state of the environment

Example Workflow: LLM Using the Sandbox

User asks the LLM to "Create a Python script that fetches weather data and save it"
LLM plans the steps:
- Create a new Python file
- Write the code to fetch weather data
- Save the file
- Run the script to test it
- Show the results to the user
LLM executes each step by making API calls to the sandbox:
- POST /text_editor with command: "create" to create a new file
- POST /text_editor with command: "write" to write the code
- POST /terminal/{id}/write to run the script
- GET /terminal/{id} to get the output
- Return the results to the user

Security Considerations

Multi-layered Authentication:
- API token authentication using the x-sandbox-token header (NOT IMPLEMENTED IN THIS CODE)
- Token verification happens at the proxy layer before requests reach the FastAPI application (NOT IMPLEMENTED IN THIS CODE)
- Tokens are stored securely in $HOME/.secrets/sandbox_api_token
Proxy Service Protection:
- The proxy service provides an additional layer of security
- Acts as a gatekeeper for all requests to the sandbox
- Can implement rate limiting, request validation, and access control
Isolation:
- The Docker container provides isolation from the host system
- Prevents the AI from affecting the host machine directly
Resource Limitations:
- The sandbox can be configured with resource constraints (CPU, memory) at the Docker level
- Prevents resource exhaustion attacks
Action Restrictions:
- The API can be configured to restrict certain dangerous operations
- Browser automation is contained within the sandbox environment

Deployment with Docker

The sandbox is designed to run in a Docker container. The provided Dockerfile was not in the original code but gives an idea of what the container could look like:

A Python 3.12 environment
Chromium browser for web automation
All necessary dependencies
API token initialization

To build and run the container:

# Build the container
docker build -t orion-sandbox .

# Run the container
docker run -p 8080:8080 orion-sandbox

方法2: 从源码安装（推荐用于开发）

git clone https://github.com/yourusername/orion.git
cd orion
pip install -e .

使用方法

作为服务使用

方法1: 使用命令行工具

安装后，可以直接使用命令行工具启动服务：

# 使用默认配置启动
orion-server

# 指定端口和日志级别
orion-server --port 8888 --log-level debug

# 开发模式（自动重载）
orion-server --reload

方法2: 使用Python脚本启动

python start_server.py --port 8330

作为库导入使用

可以将Orion作为Python库导入使用，示例代码：

import asyncio
from app import BrowserManager, terminal_manager, text_editor

# 初始化浏览器管理器
async def browser_example():
    browser = BrowserManager(headless=False)
    await browser.initialize()
    # 执行浏览器操作...
    await browser.close()

# 使用终端管理器
async def terminal_example():
    terminal = await terminal_manager.create_or_get_terminal("my_terminal")
    await terminal.execute_command("ls -la")
    history = terminal.get_history(True, True)
    # 处理终端输出...

# 运行示例
asyncio.run(browser_example())

更多示例请参考 examples/use_as_package.py。

Docker部署

# 构建容器
docker build -t orion-server .

# 运行容器
docker run -p 8330:8330 orion-server

API文档

启动服务后，访问 http://localhost:8330/docs 查看API文档。

许可证

MIT

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.6.0

Jul 4, 2025

0.5.1

Jun 6, 2025

0.5.0

Jun 5, 2025

0.4.9

Jun 5, 2025

0.4.8

Jun 4, 2025

0.4.7

May 27, 2025

0.4.6

May 27, 2025

0.4.5

May 27, 2025

0.4.4

May 26, 2025

0.4.3

May 23, 2025

0.4.2

May 22, 2025

0.4.1

May 22, 2025

0.4.0

May 22, 2025

0.3.9

May 22, 2025

0.3.8

May 21, 2025

0.3.7

May 20, 2025

0.3.6

May 20, 2025

0.3.5

May 20, 2025

0.3.4

May 20, 2025

0.3.3

May 16, 2025

0.3.2

May 16, 2025

0.3.1

May 16, 2025

0.3.0

May 16, 2025

0.2.3

May 12, 2025

0.2.2

May 8, 2025

This version

0.2.1

May 7, 2025

0.2.0

May 7, 2025

0.1.9

Apr 27, 2025

0.1.8

Apr 27, 2025

0.1.7

Apr 27, 2025

0.1.6

Apr 27, 2025

0.1.5

Apr 25, 2025

0.1.4

Apr 25, 2025

0.1.3

Apr 25, 2025

0.1.2

Apr 25, 2025

0.1.1

Apr 25, 2025

0.1.0

Apr 25, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

orion_browser-0.2.1.tar.gz (362.3 kB view details)

Uploaded May 7, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

orion_browser-0.2.1-py3-none-any.whl (391.8 kB view details)

Uploaded May 7, 2025 Python 3

File details

Details for the file orion_browser-0.2.1.tar.gz.

File metadata

Download URL: orion_browser-0.2.1.tar.gz
Upload date: May 7, 2025
Size: 362.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for orion_browser-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`7fac59225bc4dae4fb7aea47dd942b4446edd96c9b8de212563d5e658ae8d4b8`
MD5	`ff9cdc1b138bf0dbb3c7242a29945d4d`
BLAKE2b-256	`2751129ecba3369584f70b4bdaa84fefbf356ede7bd15ea2de6d46d59b8fda96`

See more details on using hashes here.

File details

Details for the file orion_browser-0.2.1-py3-none-any.whl.

File metadata

Download URL: orion_browser-0.2.1-py3-none-any.whl
Upload date: May 7, 2025
Size: 391.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for orion_browser-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3bbbeaba607d2b1f04963c85e72c5bb574dd8d5a5c43870752842224b8372c73`
MD5	`7572244f9c60ed94b94755e211480600`
BLAKE2b-256	`cdaec258e96a676f0382a302d950e74cca79a8245265a9bab8fbba5863024e13`

See more details on using hashes here.

orion-browser 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Orion Sandbox

功能特点

安装

Architecture

Key Components

browser_use Integration

Key Classes and Components:

Agent Class (browser_use/agent/service.py)

Browser Context (browser_use/browser/context.py)

System Prompts (browser_use/agent/prompts.py)

Controller Registry (browser_use/controller/registry/service.py)

How AI-Sandbox Communication Works

Example Flow: AI Running a Shell Command

API Client Usage

LLM Response Format for Browser Automation

Available Browser Actions

Navigation Actions

Element Interaction

Content Extraction

Task Completion

Integration with LLM Systems

Example Workflow: LLM Using the Sandbox

Security Considerations

Deployment with Docker

方法2: 从源码安装（推荐用于开发）

使用方法

作为服务使用

方法1: 使用命令行工具

方法2: 使用Python脚本启动

作为库导入使用

Docker部署

API文档

许可证

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes