Skip to main content

This is the Python SDK for Computer Use Agent, allowing you to easily control the computer desktop environment from your applications.

Project description

Lumi CUA SDK Guide

Overview

SDK for Lumi Computer Use Application, providing programmatic access to sandbox management and remote control capabilities.

Installation

pip install lumi-cua-sdk

Usage

Setup Environment

  1. Deploy your own Remote Computer Use Agent, you can explore more on Volcano Engine's OS Agent Services via deployment links (in Chinese) Computer Use Agent
  2. After the application deployment is completed, get Sandbox Manager URL, Agent Planner URL and Auth Token from the details page of Computer Use Application in Volcano Engine's OS Agent Services:
    • Get Sandbox Manager Url from Computer Use Agent Application Get sandbox manager url
    • Get Agent Planner Url from Computer Use Agent Application Get agent planner url
    • Get Auth Token from Computer Use Agent Application Get auth token
  3. Export environment variables locally
  export SANDBOX_MANAGER_URL=${your_sandbox_manager_url} 
  export AGENT_PLANNER_URL=${your_agent_planner_url}   
  export AUTH_TOKEN=${your_auth_token}

Basic Usage

Here's a basic example of using the SDK,
For Linux Sandbox:

import asyncio
from lumi_cua_sdk import LumiCuaClient, Action, THINKING_DISABLED, THINKING_ENABLED

async def main():
    #  Initialize Client
    client = LumiCuaClient()
    try:
        # List or start sandboxes
        sandboxes = await client.list_sandboxes()
        if not sandboxes:
            print("No existing sandboxes found. Starting a new Linux sandbox...")
            sandbox = await client.start_linux()
            print(f"Started Linux sandbox: ID={sandbox.id}, IP={sandbox.ip_address}, ToolServerEndpoint={sandbox.tool_server_endpoint}")
        else:
            sandbox = sandboxes[0] # Use the first available sandbox
            print(f"Using existing sandbox: ID={sandbox.id}, IP={sandbox.ip_address}")

        # Get sandbox stream url
        stream_url = await sandbox.get_stream_url()
        print(f"Stream URL: {stream_url}")

        # Take screenshot
        screenshot_result = await sandbox.screenshot()
        print(f"Screenshot taken (first 64 chars): {screenshot_result.base_64_image[:64]}...")

        # Sandbox computer operation action
        await sandbox.computer(action=Action.MOVE_MOUSE, coordinates=[100, 150])
        print("Mouse moved.")

        await sandbox.computer(action=Action.TYPE_TEXT, text="Hello from Lumi CUA SDK!")
        print("Text typed.")

        await sandbox.computer(action=Action.CLICK_MOUSE, coordinates=[200, 250], button="right")
        print("Mouse clicked.")

        await sandbox.computer(action=Action.SCROLL, coordinates=[300, 350], scroll_direction="up", scroll_amount=30)
        print("Scrolled.")

        await sandbox.computer(action=Action.PRESS_KEY, keys=["Enter"])
        print("Pressed Enter.")

        await sandbox.computer(action=Action.TAKE_SCREENSHOT)
        print("Screenshot taken.")

        await sandbox.computer(action=Action.WAIT, duration=10)
        print("Waited.")

        # Task Integration
        # Get available models and set thinking mode
        models = await client.list_models()
        thinking_type = THINKING_ENABLED if models[0].is_thinking else THINKING_DISABLED

        # Run task
        task_prompt = "open the browse"
        try:
            async for message in client.run_task(task_prompt, sandbox.id, models[0].name,
                                                 user_system_prompt="", thinking_type=thinking_type):
                print("summary:", message.summary)
                print("action:", message.action)
                print("screenshot:", message.screenshot)
                print("task_id:", message.task_id)
        except Exception as e:
            print(f"\nError occured:", str(e))

        # Delete sandbox (optional)
        print(f"Deleting sandbox {sandbox.id}...")
        await sandbox.delete()
        print("Sandbox stopped and deleted.")

    except Exception as e:
        print(f"An error occurred: {e}")
        import traceback
        traceback.print_exc()

if __name__ == "__main__":
    asyncio.run(main())

For Windows Sandbox:

import asyncio
from lumi_cua_sdk import LumiCuaClient, Action, THINKING_DISABLED, THINKING_ENABLED

async def main():
    #  Initialize Client
    client = LumiCuaClient()
    try:
        # List or start sandboxes
        sandboxes = await client.list_sandboxes()
        if not sandboxes:
            print("No existing sandboxes found. Starting a new Linux sandbox...")
            sandbox = await client.start_linux()
            print(f"Started Linux sandbox: ID={sandbox.id}, IP={sandbox.ip_address}, ToolServerEndpoint={sandbox.tool_server_endpoint}")
        else:
            sandbox = sandboxes[0] # Use the first available sandbox
            print(f"Using existing sandbox: ID={sandbox.id}, IP={sandbox.ip_address}")

        # Get sandbox stream url
        stream_url = await sandbox.get_stream_url()
        print(f"Stream URL: {stream_url}")

        async with sandbox.rdp_session() as rdp_client:
            if rdp_client is None:
                print("Failed to establish RDP session, skipping operations")
                return
            
            # Take screenshot
            screenshot_result = await sandbox.screenshot()
            print(f"Screenshot taken (first 64 chars): {screenshot_result.base_64_image[:64]}...")
    
            # Sandbox computer operation action
            await sandbox.computer(action=Action.MOVE_MOUSE, coordinates=[100, 150])
            print("Mouse moved.")
    
            await sandbox.computer(action=Action.TYPE_TEXT, text="Hello from Lumi CUA SDK!")
            print("Text typed.")
    
            await sandbox.computer(action=Action.CLICK_MOUSE, coordinates=[200, 250], button="right")
            print("Mouse clicked.")
    
            await sandbox.computer(action=Action.SCROLL, coordinates=[300, 350], scroll_direction="up", scroll_amount=30)
            print("Scrolled.")
    
            await sandbox.computer(action=Action.PRESS_KEY, keys=["Enter"])
            print("Pressed Enter.")
    
            await sandbox.computer(action=Action.TAKE_SCREENSHOT)
            print("Screenshot taken.")
    
            await sandbox.computer(action=Action.WAIT, duration=10)
            print("Waited.")
    
            # Task Integration
            # Get available models and set thinking mode
            models = await client.list_models()
            thinking_type = THINKING_ENABLED if models[0].is_thinking else THINKING_DISABLED
    
            # Run task
            task_prompt = "open the browse"
            try:
                async for message in client.run_task(task_prompt, sandbox.id, models[0].name,
                                                     user_system_prompt="", thinking_type=thinking_type):
                    print("summary:", message.summary)
                    print("action:", message.action)
                    print("screenshot:", message.screenshot)
                    print("task_id:", message.task_id)
            except Exception as e:
                print(f"\nError occured:", str(e))

        # Delete sandbox (optional)
        print(f"Deleting sandbox {sandbox.id}...")
        await sandbox.delete()
        print("Sandbox stopped and deleted.")

    except Exception as e:
        print(f"An error occurred: {e}")
        import traceback
        traceback.print_exc()

if __name__ == "__main__":
    asyncio.run(main())

Features

  • List available sandboxes.
  • Start and delete sandboxes (Linux and Windows).
  • Get a streaming URL for sandboxe interaction.
  • Remote computer control:
    • Mouse movements, clicks, drags, scrolls.
    • Keyboard typing and key presses.
    • Take screenshots.
  • Agent integration for computer use task automation.

Development

Clone the repository and install dependencies for development:

git clone https://github.com/lelili2021/lumi-cua-sdk.git
cd lumi-cua-sdk
pip install -e .[dev]

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lumi_cua_sdk-1.1.0rc1.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lumi_cua_sdk-1.1.0rc1-py3-none-any.whl (27.6 kB view details)

Uploaded Python 3

File details

Details for the file lumi_cua_sdk-1.1.0rc1.tar.gz.

File metadata

  • Download URL: lumi_cua_sdk-1.1.0rc1.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for lumi_cua_sdk-1.1.0rc1.tar.gz
Algorithm Hash digest
SHA256 d7b07c4549352a5439179358c93b07e7bbc8c6ba108de14f2c148170e3056c34
MD5 337fbb07ab357c653efdb0dcd073646d
BLAKE2b-256 824b9551af8add8bd8017dabbfc701b1cf7ca32738665eab1bb8288a2514a1ef

See more details on using hashes here.

File details

Details for the file lumi_cua_sdk-1.1.0rc1-py3-none-any.whl.

File metadata

File hashes

Hashes for lumi_cua_sdk-1.1.0rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 8703462ebaaae92ee6bad9efdd438585b135572297a6b034ec108a40db2b9e3f
MD5 3e8b36dfb32dbf1a551e72da8dce260f
BLAKE2b-256 55847690d3704c6ed5cfe277c701266870c7893f5066401216809ab5a16c4a1e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page