Skip to main content

A powerful Python library for creating and managing isolated desktop environments using Docker containers

Project description

ScreenEnv

A powerful Python library for creating and managing isolated desktop environments using Docker containers. ScreenEnv provides a sandboxed Ubuntu desktop environment with XFCE4 that you can programmatically control for GUI automation, testing, and development.

Features

  • 🖥️ Isolated Desktop Environment: Full Ubuntu desktop with XFCE4 running in Docker
  • 🎮 GUI Automation: Complete mouse and keyboard control
  • 🌐 Web Automation: Built-in browser automation with Playwright
  • 📹 Screen Recording: Capture video recordings of all actions
  • 📸 Screenshot Capabilities: Desktop and browser screenshots
  • 🖱️ Mouse Control: Click, drag, scroll, and mouse movement
  • ⌨️ Keyboard Input: Text typing and key combinations
  • 🪟 Window Management: Launch, activate, and close applications
  • 📁 File Operations: Upload, download, and file management
  • 🐚 Terminal Access: Execute commands and capture output
  • 🤖 MCP Server Support: Model Context Protocol integration for AI/LLM automation
  • 🐳 Docker Ready: Pre-built Docker image with all dependencies

Quick Start

Installation

  1. Clone the repository:

    git clone https://github.com/huggingface/screenenv
    cd screenenv
    
  2. Install the package (choose one):

    latest release:

    pip install screenenv
    # or
    uv pip install screenenv
    

    from source:

    pip install .
    # or
    uv sync
    

Basic Usage

from screenenv import Sandbox

# Create a sandbox environment
sandbox = Sandbox()

try:
    # Launch a terminal
    sandbox.launch("xfce4-terminal")

    # Type some text
    sandbox.write("echo 'Hello from ScreenEnv!'")
    sandbox.press("Enter")

    # Take a screenshot
    screenshot = sandbox.screenshot()
    with open("screenshot.png", "wb") as f:
        f.write(screenshot)

finally:
    # Clean up
    sandbox.close()

For usage, see the source code in examples/sandbox_demo.py

MCP Server Support

ScreenEnv includes full support for the Model Context Protocol (MCP), enabling seamless integration with AI/LLM systems for desktop automation.

What is MCP?

The Model Context Protocol (MCP) is a standard for AI assistants to interact with external tools and data sources. ScreenEnv's MCP server provides desktop automation capabilities that can be used by any MCP-compatible AI system.

MCP Server Features

  • 30+ Automation Tools: Complete desktop control via MCP
  • Streamable HTTP Transport: Efficient communication protocol

Starting the MCP Server

from screenenv import MCPRemoteServer

# Start MCP server
server = MCPRemoteServer()

print(f"MCP Server URL: {server.server_url}")
print(f"Server Configuration: {server.mcp_server_json}")

MCP Client Usage

import asyncio
from mcp import ClientSession
from mcp.client.streamable_http import streamablehttp_client
from screenenv import MCPRemoteServer

async def mcp_automation():
    # Start MCP server
    server = MCPRemoteServer(headless=False)

    try:
        # Connect to MCP server
        async with streamablehttp_client(server.server_url) as (
            read_stream, write_stream, _
        ):
            async with ClientSession(read_stream, write_stream) as session:
                await session.initialize()

                # Launch terminal
                await session.call_tool("launch", {
                    "application": "xfce4-terminal",
                    "wait_for_window": True
                })

                # Type commands
                await session.call_tool("write", {"text": "echo 'Hello MCP!'"})
                await session.call_tool("press", {"key": ["Enter"]})

                # Take screenshot
                response = await session.call_tool("screenshot", {})
                screenshot_base64 = response.content[0].data

                screenshot_bytes = base64.b64decode(screenshot_base64)
                image = Image.open(io.BytesIO(screenshot_bytes))
                image.save("screenshot.png")
                ...

                print("MCP automation completed!")

    finally:
        server.close()

# Run the automation
asyncio.run(mcp_automation())

Available MCP Tools

System Operations

  • execute_command - Execute shell commands
  • get_platform - Get system platform information
  • get_screen_size - Get screen dimensions
  • get_desktop_path - Get desktop directory path
  • get_directory_tree - List directory contents
  • get_file - Get file contents
  • download_file - Download file from URL
  • start_recording - Start screen recording
  • end_recording - End screen recording

Application Management

  • wait - Wait for specified milliseconds
  • open - Open file or URL
  • launch - Launch application
  • get_current_window_id - Get current window ID
  • get_application_windows - Get windows for application
  • get_window_name - Get window name/title
  • get_window_size - Get window size
  • activate_window - Activate window
  • close_window - Close window
  • get_terminal_output - Get terminal output

GUI Automation

  • screenshot - Take screenshot
  • left_click - Left click at coordinates
  • double_click - Double click at coordinates
  • right_click - Right click at coordinates
  • middle_click - Middle click at coordinates
  • scroll - Scroll mouse wheel
  • move_mouse - Move mouse to coordinates
  • mouse_press - Press mouse button
  • mouse_release - Release mouse button
  • get_cursor_position - Get cursor position
  • write - Type text
  • press - Press keys
  • drag - Drag mouse from one position to another

MCP Server Configuration

# Advanced MCP server configuration
server = MCPRemoteServer(
    os_type="Ubuntu",
    provider_type="docker",
    headless=True,
    resolution=(1920, 1080),
    disk_size="32G",
    ram_size="4G",
    cpu_cores="4",
    session_password="your_password",
    stream_server=True,
    dpi=96,
    timeout=1000
)

Sandbox Instantiation

Basic Configuration

from screenenv import Sandbox

# Minimal configuration
sandbox = Sandbox()

# With custom settings
sandbox = Sandbox(
    os_type="Ubuntu",           # Currently only Ubuntu is supported
    provider_type="docker",     # Currently only Docker is supported
    headless=True,              # Run without VNC viewer
    resolution=(1920, 1080),
    disk_size="32G",
    ram_size="4G",
    cpu_cores="4",
    session_password="your_password",
    stream_server=True,
    dpi=96,
    timeout=1000
)

Core Features

Mouse Control

# Click operations
sandbox.left_click(x=100, y=200)
sandbox.right_click(x=300, y=400)
sandbox.double_click(x=500, y=600)

# Mouse movement
sandbox.move_mouse(x=800, y=900)

# Drag and drop
sandbox.drag(fr=(100, 100), to=(200, 200))

# Scrolling
sandbox.scroll(direction="down", amount=3)

sandbox.mouse_release(button="left")

sandbox.mouse_press(button="left")
sandbox.mouse_release(button="left")

Keyboard Input

# Type text
sandbox.write("Hello, World!", delay_in_ms=50)

# Key combinations
sandbox.press(["Ctrl", "C"])  # Copy
sandbox.press(["Ctrl", "V"])  # Paste
sandbox.press(["Alt", "Tab"]) # Switch windows
sandbox.press("Enter")        # Single key

Application Management

# Launch applications
sandbox.launch("xfce4-terminal")
sandbox.launch("libreoffice --writer")
sandbox.open("https://www.google.com")

# Window management
windows = sandbox.get_application_windows("xfce4-terminal")
window_id = windows[0]
sandbox.activate_window(window_id)

window_id = sandbox.get_current_window_id() # get the current activate window id.
sandbox.window_size(window_id)
sandbox.get_window_title(window_id)
sandbox.close_window(window_id)

File Operations

# Upload files to sandbox
sandbox.upload_file_to_remote("local_file.txt", "/home/user/remote_file.txt")

# Download files from sandbox
sandbox.download_file_from_remote("/home/user/remote_file.txt", "local_file.txt")

# Download from URL
sandbox.download_url_file_to_remote("https://example.com/file.txt", "/home/user/file.txt")

Screenshots and Recording

# Start recording
sandbox.start_recording()

# Take screenshots
desktop_screenshot = sandbox.desktop_screenshot()

# Stop recording and save it locally to a file 'demo.mp4'
sandbox.end_recording("demo.mp4")

Terminal Operations

# Execute commands
response = sandbox.execute_command("ls -la")
print(response.output)

# Python commands
response = sandbox.execute_python_command("print('Hello')", ["os"])
print(response.output)

# Get terminal output
output = sandbox.get_terminal_output() # Only if a desktop terminal application is running. To get command output, use execute_command() instead.

Examples

Complete GUI Automation Demo

from screenenv import Sandbox
import time

def demo_automation():
    sandbox = Sandbox(headless=False)

    try:
        # Launch terminal
        sandbox.launch("xfce4-terminal")
        time.sleep(2)

        # Type commands
        sandbox.write("echo 'Starting automation demo'")
        sandbox.press("Enter")

        # Open web browser
        sandbox.open("https://www.python.org")
        time.sleep(3)

        # Take screenshot
        screenshot = sandbox.screenshot()
        with open("demo_screenshot.png", "wb") as f:
            f.write(screenshot)

    finally:
        sandbox.close()

if __name__ == "__main__":
    demo_automation()

Web Automation with Playwright

from screenenv import Sandbox

def web_automation():
    sandbox = Sandbox(headless=True)

    try:
        # Open website
        sandbox.open("https://www.example.com")

        # Take browser screenshot
        screenshot = sandbox.playwright_screenshot(full_page=True)
        with open("web_screenshot.png", "wb") as f:
            f.write(screenshot)

        playwright_browser = sandbox.playwright_browser()

    finally:
        sandbox.close()

MCP Server Demo

python -m examples.mcp_server_demo # or sudo -E python -m examples.mcp_server_demo if not in docker group

Sandbox Demo

python -m examples.sandbox_demo # or sudo -E python -m examples.sandbox_demo if not in docker group

Desktop Agent Demo

python -m examples.desktop_agent # or sudo -E python -m examples.desktop_agent if not in docker group

System Requirements

  • Docker: Must be installed and running
  • Python: 3.10 or higher
  • Playwright: For web automation features
  • Memory: At least 4GB RAM recommended

Sandbox Image

The sandbox uses a custom Ubuntu 22.04 Docker image with:

  • XFCE4 desktop environment
  • VNC server for remote access
  • Google Chrome/Chromium browser
  • LibreOffice suite
  • Python development tools
  • MCP server support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

screenenv-0.1.2.tar.gz (27.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

screenenv-0.1.2-py3-none-any.whl (30.5 kB view details)

Uploaded Python 3

File details

Details for the file screenenv-0.1.2.tar.gz.

File metadata

  • Download URL: screenenv-0.1.2.tar.gz
  • Upload date:
  • Size: 27.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for screenenv-0.1.2.tar.gz
Algorithm Hash digest
SHA256 f39f43a2cee968dd3782cc01799c79451b2f3503a77988976885a2168773bce4
MD5 0be4dfae67a3d94673f4587e7c14c494
BLAKE2b-256 da65151bebc2037354bf6eb98e648fc40205bb05b1d5aaab736a12fe18f7784d

See more details on using hashes here.

File details

Details for the file screenenv-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: screenenv-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 30.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for screenenv-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 005ab04980614add12c05787f7d2afa329707c5a8b3cfa26832f4a4b4e7156ce
MD5 b1d829b9112096656fec411587ac39cd
BLAKE2b-256 289535408a4a9d7387f40ce29588ffe27452e8094d33f47c0aa42fb9c3dd4145

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page