Skip to main content

This is the Python SDK for Computer Use Agent, allowing you to easily control the computer desktop environment from your applications.

Project description

Lumi CUA SDK Guide

Overview

SDK for Lumi Computer Use Application, providing programmatic access to sandbox management and remote control capabilities.

Installation

pip install lumi-cua-sdk

Usage

Setup Environment

  1. Deploy your own Remote Computer Use Agent, you can explore more on Volcano Engine's OS Agent Services via deployment links (in Chinese) Computer Use Agent
  2. After the application deployment is completed, get Sandbox Manager URL, Agent Planner URL and Auth Token from the details page of Computer Use Application in Volcano Engine's OS Agent Services:
    • Get Sandbox Manager Url from Computer Use Agent Application Get sandbox manager url
    • Get Agent Planner Url from Computer Use Agent Application Get agent planner url
    • Get Auth Token from Computer Use Agent Application Get auth token
  3. Export environment variables locally
  export SANDBOX_MANAGER_URL=${your_sandbox_manager_url} 
  export AGENT_PLANNER_URL=${your_agent_planner_url}   
  export AUTH_TOKEN=${your_auth_token}

Basic Usage

Here's a basic example of using the SDK,
For Linux Sandbox and Windows Sandbox using VNC stream protocol:

import asyncio
from lumi_cua_sdk import LumiCuaClient, Action, THINKING_DISABLED, THINKING_ENABLED

async def main():
    #  Initialize Client
    client = LumiCuaClient()
    try:
        # List or start sandboxes
        sandboxes = await client.list_sandboxes()
        if not sandboxes:
            print("No existing sandboxes found. Starting a new Linux sandbox...")
            sandbox = await client.start_linux()
            print(f"Started Linux sandbox: ID={sandbox.id}, IP={sandbox.ip_address}, ToolServerEndpoint={sandbox.tool_server_endpoint}")
        else:
            sandbox = sandboxes[0] # Use the first available sandbox
            print(f"Using existing sandbox: ID={sandbox.id}, IP={sandbox.ip_address}")

        # Get sandbox stream url
        stream_url = await sandbox.get_stream_url()
        print(f"Stream URL: {stream_url}")

        # Take screenshot
        screenshot_result = await sandbox.screenshot()
        print(f"Screenshot taken (first 64 chars): {screenshot_result.base_64_image[:64]}...")

        # Sandbox computer operation action
        await sandbox.computer(action=Action.MOVE_MOUSE, coordinates=[100, 150])
        print("Mouse moved.")

        await sandbox.computer(action=Action.TYPE_TEXT, text="Hello from Lumi CUA SDK!")
        print("Text typed.")

        await sandbox.computer(action=Action.CLICK_MOUSE, coordinates=[200, 250], button="right")
        print("Mouse clicked.")

        await sandbox.computer(action=Action.SCROLL, coordinates=[300, 350], scroll_direction="up", scroll_amount=30)
        print("Scrolled.")

        await sandbox.computer(action=Action.PRESS_KEY, keys=["Enter"])
        print("Pressed Enter.")

        await sandbox.computer(action=Action.TAKE_SCREENSHOT)
        print("Screenshot taken.")

        await sandbox.computer(action=Action.WAIT, duration=10)
        print("Waited.")

        # Task Integration
        # Get available models and set thinking mode
        models = await client.list_models()
        thinking_type = THINKING_ENABLED if models[0].is_thinking else THINKING_DISABLED

        # Run task
        task_prompt = "open the browse"
        try:
            async for message in client.run_task(task_prompt, sandbox.id, models[0].name,
                                                 user_system_prompt="", thinking_type=thinking_type):
                print("summary:", message.summary)
                print("action:", message.action)
                print("screenshot:", message.screenshot)
                print("task_id:", message.task_id)
        except Exception as e:
            print(f"\nError occured:", str(e))

        # Delete sandbox (optional)
        print(f"Deleting sandbox {sandbox.id}...")
        await sandbox.delete()
        print("Sandbox stopped and deleted.")

    except Exception as e:
        print(f"An error occurred: {e}")
        import traceback
        traceback.print_exc()

if __name__ == "__main__":
    asyncio.run(main())

For Windows Sandbox using GUACAMOLE stream protocol:

import asyncio
from lumi_cua_sdk import LumiCuaClient, Action, THINKING_DISABLED, THINKING_ENABLED

async def main():
    #  Initialize Client
    client = LumiCuaClient()
    try:
        # List or start sandboxes
        sandboxes = await client.list_sandboxes()
        if not sandboxes:
            print("No existing sandboxes found. Starting a new Linux sandbox...")
            sandbox = await client.start_linux()
            print(f"Started Linux sandbox: ID={sandbox.id}, IP={sandbox.ip_address}, ToolServerEndpoint={sandbox.tool_server_endpoint}")
        else:
            sandbox = sandboxes[0] # Use the first available sandbox
            print(f"Using existing sandbox: ID={sandbox.id}, IP={sandbox.ip_address}")

        # Get sandbox stream url
        stream_url = await sandbox.get_stream_url()
        print(f"Stream URL: {stream_url}")

        async with sandbox.rdp_session() as rdp_client:
            if rdp_client is None:
                print("Failed to establish RDP session, skipping operations")
                return
            
            # Take screenshot
            screenshot_result = await sandbox.screenshot()
            print(f"Screenshot taken (first 64 chars): {screenshot_result.base_64_image[:64]}...")
    
            # Sandbox computer operation action
            await sandbox.computer(action=Action.MOVE_MOUSE, coordinates=[100, 150])
            print("Mouse moved.")
    
            await sandbox.computer(action=Action.TYPE_TEXT, text="Hello from Lumi CUA SDK!")
            print("Text typed.")
    
            await sandbox.computer(action=Action.CLICK_MOUSE, coordinates=[200, 250], button="right")
            print("Mouse clicked.")
    
            await sandbox.computer(action=Action.SCROLL, coordinates=[300, 350], scroll_direction="up", scroll_amount=30)
            print("Scrolled.")
    
            await sandbox.computer(action=Action.PRESS_KEY, keys=["Enter"])
            print("Pressed Enter.")
    
            await sandbox.computer(action=Action.TAKE_SCREENSHOT)
            print("Screenshot taken.")
    
            await sandbox.computer(action=Action.WAIT, duration=10)
            print("Waited.")
    
            # Task Integration
            # Get available models and set thinking mode
            models = await client.list_models()
            thinking_type = THINKING_ENABLED if models[0].is_thinking else THINKING_DISABLED
    
            # Run task
            task_prompt = "open the browse"
            try:
                async for message in client.run_task(task_prompt, sandbox.id, models[0].name,
                                                     user_system_prompt="", thinking_type=thinking_type):
                    print("summary:", message.summary)
                    print("action:", message.action)
                    print("screenshot:", message.screenshot)
                    print("task_id:", message.task_id)
            except Exception as e:
                print(f"\nError occured:", str(e))

        # Delete sandbox (optional)
        print(f"Deleting sandbox {sandbox.id}...")
        await sandbox.delete()
        print("Sandbox stopped and deleted.")

    except Exception as e:
        print(f"An error occurred: {e}")
        import traceback
        traceback.print_exc()

if __name__ == "__main__":
    asyncio.run(main())

Features

  • List available sandboxes.
  • Start and delete sandboxes (Linux and Windows).
  • Get a streaming URL for sandboxe interaction.
  • Remote computer control:
    • Mouse movements, clicks, drags, scrolls.
    • Keyboard typing and key presses.
    • Take screenshots.
  • Agent integration for computer use task automation.

Development

Clone the repository and install dependencies for development:

git clone https://github.com/lelili2021/lumi-cua-sdk.git
cd lumi-cua-sdk
pip install -e .[dev]

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lumi_cua_sdk-1.2.0rc1.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lumi_cua_sdk-1.2.0rc1-py3-none-any.whl (27.7 kB view details)

Uploaded Python 3

File details

Details for the file lumi_cua_sdk-1.2.0rc1.tar.gz.

File metadata

  • Download URL: lumi_cua_sdk-1.2.0rc1.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for lumi_cua_sdk-1.2.0rc1.tar.gz
Algorithm Hash digest
SHA256 5d51433c4cf7f853b380d69f3a661001f1d21dea47770735d924a46eeffebd3f
MD5 0aac1f0ac4f1b901de9462dc11698c5a
BLAKE2b-256 fea60a7009fb889264ba72131f731838e04fe1484dca5a96a43853a1bc41046b

See more details on using hashes here.

File details

Details for the file lumi_cua_sdk-1.2.0rc1-py3-none-any.whl.

File metadata

File hashes

Hashes for lumi_cua_sdk-1.2.0rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 490a017cd74389185ea7120af2d43021d3c5723942455cacf014780d59284abd
MD5 8d03c1bb825e4e4722d4a65a25a320a9
BLAKE2b-256 9e3ed95d3b15ed101e3ac2c0b34a7f822c6a2f1a9387eb0a2025176c9ba9f065

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page