Skip to main content

This is the Python SDK for Computer Use Agent, allowing you to easily control the computer desktop environment from your applications.

Project description

Lumi CUA SDK Guide

Overview

SDK for Lumi Computer Use Application, providing programmatic access to sandbox management and remote control capabilities.

Installation

pip install lumi-cua-sdk

Usage

Setup Environment

  1. Deploy your own Remote Computer Use Agent, you can explore more on Volcano Engine's OS Agent Services via deployment links (in Chinese) Computer Use Agent
  2. After the application deployment is completed, get Sandbox Manager URL, Agent Planner URL and Auth Token from the details page of Computer Use Application in Volcano Engine's OS Agent Services:
    • Get Sandbox Manager Url from Computer Use Agent Application Get sandbox manager url
    • Get Agent Planner Url from Computer Use Agent Application Get agent planner url
    • Get Auth Token from Computer Use Agent Application Get auth token
  3. Export environment variables locally
  export SANDBOX_MANAGER_URL=${your_sandbox_manager_url} 
  export AGENT_PLANNER_URL=${your_agent_planner_url}   
  export AUTH_TOKEN=${your_auth_token}

Basic Usage

Here's a basic example of using the SDK,
For Linux Sandbox:

import asyncio
from lumi_cua_sdk import LumiCuaClient, Action, THINKING_DISABLED, THINKING_ENABLED

async def main():
    #  Initialize Client
    client = LumiCuaClient()
    try:
        # List or start sandboxes
        sandboxes = await client.list_sandboxes()
        if not sandboxes:
            print("No existing sandboxes found. Starting a new Linux sandbox...")
            sandbox = await client.start_linux()
            print(f"Started Linux sandbox: ID={sandbox.id}, IP={sandbox.ip_address}, ToolServerEndpoint={sandbox.tool_server_endpoint}")
        else:
            sandbox = sandboxes[0] # Use the first available sandbox
            print(f"Using existing sandbox: ID={sandbox.id}, IP={sandbox.ip_address}")

        # Get sandbox stream url
        stream_url = await sandbox.get_stream_url()
        print(f"Stream URL: {stream_url}")

        # Take screenshot
        screenshot_result = await sandbox.screenshot()
        print(f"Screenshot taken (first 64 chars): {screenshot_result.base_64_image[:64]}...")

        # Sandbox computer operation action
        await sandbox.computer(action=Action.MOVE_MOUSE, coordinates=[100, 150])
        print("Mouse moved.")

        await sandbox.computer(action=Action.TYPE_TEXT, text="Hello from Lumi CUA SDK!")
        print("Text typed.")

        await sandbox.computer(action=Action.CLICK_MOUSE, coordinates=[200, 250], button="right")
        print("Mouse clicked.")

        await sandbox.computer(action=Action.SCROLL, coordinates=[300, 350], scroll_direction="up", scroll_amount=30)
        print("Scrolled.")

        await sandbox.computer(action=Action.PRESS_KEY, keys=["Enter"])
        print("Pressed Enter.")

        await sandbox.computer(action=Action.TAKE_SCREENSHOT)
        print("Screenshot taken.")

        await sandbox.computer(action=Action.WAIT, duration=10)
        print("Waited.")

        # Task Integration
        # Get available models and set thinking mode
        models = await client.list_models()
        thinking_type = THINKING_ENABLED if models[0].is_thinking else THINKING_DISABLED

        # Run task
        task_prompt = "open the browse"
        try:
            async for message in client.run_task(task_prompt, sandbox.id, models[0].name,
                                                 user_system_prompt="", thinking_type=thinking_type):
                print("summary:", message.summary)
                print("action:", message.action)
                print("screenshot:", message.screenshot)
                print("task_id:", message.task_id)
        except Exception as e:
            print(f"\nError occured:", str(e))

        # Delete sandbox (optional)
        print(f"Deleting sandbox {sandbox.id}...")
        await sandbox.delete()
        print("Sandbox stopped and deleted.")

    except Exception as e:
        print(f"An error occurred: {e}")
        import traceback
        traceback.print_exc()

if __name__ == "__main__":
    asyncio.run(main())

For Windows Sandbox:

import asyncio
from lumi_cua_sdk import LumiCuaClient, Action, THINKING_DISABLED, THINKING_ENABLED

async def main():
    #  Initialize Client
    client = LumiCuaClient()
    try:
        # List or start sandboxes
        sandboxes = await client.list_sandboxes()
        if not sandboxes:
            print("No existing sandboxes found. Starting a new Linux sandbox...")
            sandbox = await client.start_linux()
            print(f"Started Linux sandbox: ID={sandbox.id}, IP={sandbox.ip_address}, ToolServerEndpoint={sandbox.tool_server_endpoint}")
        else:
            sandbox = sandboxes[0] # Use the first available sandbox
            print(f"Using existing sandbox: ID={sandbox.id}, IP={sandbox.ip_address}")

        # Get sandbox stream url
        stream_url = await sandbox.get_stream_url()
        print(f"Stream URL: {stream_url}")

        async with sandbox.rdp_session() as rdp_client:
            if rdp_client is None:
                print("Failed to establish RDP session, skipping operations")
                return
            
            # Take screenshot
            screenshot_result = await sandbox.screenshot()
            print(f"Screenshot taken (first 64 chars): {screenshot_result.base_64_image[:64]}...")
    
            # Sandbox computer operation action
            await sandbox.computer(action=Action.MOVE_MOUSE, coordinates=[100, 150])
            print("Mouse moved.")
    
            await sandbox.computer(action=Action.TYPE_TEXT, text="Hello from Lumi CUA SDK!")
            print("Text typed.")
    
            await sandbox.computer(action=Action.CLICK_MOUSE, coordinates=[200, 250], button="right")
            print("Mouse clicked.")
    
            await sandbox.computer(action=Action.SCROLL, coordinates=[300, 350], scroll_direction="up", scroll_amount=30)
            print("Scrolled.")
    
            await sandbox.computer(action=Action.PRESS_KEY, keys=["Enter"])
            print("Pressed Enter.")
    
            await sandbox.computer(action=Action.TAKE_SCREENSHOT)
            print("Screenshot taken.")
    
            await sandbox.computer(action=Action.WAIT, duration=10)
            print("Waited.")
    
            # Task Integration
            # Get available models and set thinking mode
            models = await client.list_models()
            thinking_type = THINKING_ENABLED if models[0].is_thinking else THINKING_DISABLED
    
            # Run task
            task_prompt = "open the browse"
            try:
                async for message in client.run_task(task_prompt, sandbox.id, models[0].name,
                                                     user_system_prompt="", thinking_type=thinking_type):
                    print("summary:", message.summary)
                    print("action:", message.action)
                    print("screenshot:", message.screenshot)
                    print("task_id:", message.task_id)
            except Exception as e:
                print(f"\nError occured:", str(e))

        # Delete sandbox (optional)
        print(f"Deleting sandbox {sandbox.id}...")
        await sandbox.delete()
        print("Sandbox stopped and deleted.")

    except Exception as e:
        print(f"An error occurred: {e}")
        import traceback
        traceback.print_exc()

if __name__ == "__main__":
    asyncio.run(main())

Features

  • List available sandboxes.
  • Start and delete sandboxes (Linux and Windows).
  • Get a streaming URL for sandboxe interaction.
  • Remote computer control:
    • Mouse movements, clicks, drags, scrolls.
    • Keyboard typing and key presses.
    • Take screenshots.
  • Agent integration for computer use task automation.

Development

Clone the repository and install dependencies for development:

git clone https://github.com/lelili2021/lumi-cua-sdk.git
cd lumi-cua-sdk
pip install -e .[dev]

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lumi_cua_sdk-1.1.0.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lumi_cua_sdk-1.1.0-py3-none-any.whl (27.6 kB view details)

Uploaded Python 3

File details

Details for the file lumi_cua_sdk-1.1.0.tar.gz.

File metadata

  • Download URL: lumi_cua_sdk-1.1.0.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for lumi_cua_sdk-1.1.0.tar.gz
Algorithm Hash digest
SHA256 b9a5a2950517e64f4b9df54260eceda70eb1528eff242ffc8ddf61024adebac8
MD5 597dc03462a9b25ec43fceac80bce843
BLAKE2b-256 ab875cc10716d431be97d1714e829aa165eb9e1563df5d8dfba3687ac8af5709

See more details on using hashes here.

File details

Details for the file lumi_cua_sdk-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: lumi_cua_sdk-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 27.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for lumi_cua_sdk-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 30e7ae518a0d9595443c80d17e5f36a7c01412832158d6eac103295a3657d1e2
MD5 a23bda46822ef936e068492e2ca89227
BLAKE2b-256 2abb95ff0d215d6a9aac8338df405e1cd2acdc828c06a2c676503f4e58fd357d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page