Skip to main content

A powerful Python library for creating and managing isolated desktop environments using Docker containers

Project description

ScreenEnv

A powerful Python library for creating and managing isolated desktop environments using Docker containers. ScreenEnv provides a sandboxed Ubuntu desktop environment with XFCE4 that you can programmatically control for GUI automation, testing, and development.

Features

  • 🖥️ Isolated Desktop Environment: Full Ubuntu desktop with XFCE4 running in Docker
  • 🎮 GUI Automation: Complete mouse and keyboard control
  • 🌐 Web Automation: Built-in browser automation with Playwright
  • 📹 Screen Recording: Capture video recordings of all actions
  • 📸 Screenshot Capabilities: Desktop and browser screenshots
  • 🖱️ Mouse Control: Click, drag, scroll, and mouse movement
  • ⌨️ Keyboard Input: Text typing and key combinations
  • 🪟 Window Management: Launch, activate, resize, and close applications
  • 📁 File Operations: Upload, download, and file management
  • 🐚 Terminal Access: Execute commands and capture output
  • 🔒 Secure: Isolated environment with session-based authentication
  • 🤖 MCP Server Support: Model Context Protocol integration for AI/LLM automation
  • 🐳 Docker Ready: Pre-built Docker image with all dependencies

Quick Start

Installation

  1. Clone the repository:

    git clone <repository-url>
    cd screenenv
    
  2. Install the package (choose one):

    Using pip:

    pip install .
    

    Using uv:

    uv sync
    

Basic Usage

from screenenv import Sandbox

# Create a sandbox environment
sandbox = Sandbox()

try:
    # Launch a terminal
    sandbox.launch("xfce4-terminal")

    # Type some text
    sandbox.write("echo 'Hello from ScreenEnv!'")
    sandbox.press("Enter")

    # Take a screenshot
    screenshot = sandbox.screenshot()
    with open("screenshot.png", "wb") as f:
        f.write(screenshot)

finally:
    # Clean up
    sandbox.close()

For usage, see the source code in examples/sandbox_demo.py

MCP Server Support

ScreenEnv includes full support for the Model Context Protocol (MCP), enabling seamless integration with AI/LLM systems for desktop automation.

What is MCP?

The Model Context Protocol (MCP) is a standard for AI assistants to interact with external tools and data sources. ScreenEnv's MCP server provides desktop automation capabilities that can be used by any MCP-compatible AI system.

MCP Server Features

  • 30+ Automation Tools: Complete desktop control via MCP
  • Streamable HTTP Transport: Efficient communication protocol

Starting the MCP Server

from screenenv import MCPRemoteServer

# Start MCP server
server = MCPRemoteServer()

print(f"MCP Server URL: {server.server_url}")
print(f"Server Configuration: {server.mcp_server_json}")

MCP Client Usage

import asyncio
from mcp import ClientSession
from mcp.client.streamable_http import streamablehttp_client
from screenenv import MCPRemoteServer

async def mcp_automation():
    # Start MCP server
    server = MCPRemoteServer(headless=False)

    try:
        # Connect to MCP server
        async with streamablehttp_client(server.server_url) as (
            read_stream, write_stream, _
        ):
            async with ClientSession(read_stream, write_stream) as session:
                await session.initialize()

                # Launch terminal
                await session.call_tool("launch", {
                    "application": "xfce4-terminal",
                    "wait_for_window": True
                })

                # Type commands
                await session.call_tool("write", {"text": "echo 'Hello MCP!'"})
                await session.call_tool("press", {"key": ["Enter"]})

                # Take screenshot
                response = await session.call_tool("screenshot", {})
                screenshot_base64 = response.content[0].data

                screenshot_bytes = base64.b64decode(screenshot_base64)
                image = Image.open(io.BytesIO(screenshot_bytes))
                image.save("screenshot.png")
                ...

                print("MCP automation completed!")

    finally:
        server.close()

# Run the automation
asyncio.run(mcp_automation())

Available MCP Tools

System Operations

  • execute_command - Execute shell commands
  • get_platform - Get system platform information
  • get_screen_size - Get screen dimensions
  • get_desktop_path - Get desktop directory path
  • get_directory_tree - List directory contents
  • get_file - Get file contents
  • download_file - Download file from URL
  • start_recording - Start screen recording
  • end_recording - End screen recording

Application Management

  • wait - Wait for specified milliseconds
  • open - Open file or URL
  • launch - Launch application
  • get_current_window_id - Get current window ID
  • get_application_windows - Get windows for application
  • get_window_name - Get window name/title
  • get_window_size - Get window size
  • activate_window - Activate window
  • close_window - Close window
  • get_terminal_output - Get terminal output

GUI Automation

  • screenshot - Take screenshot
  • left_click - Left click at coordinates
  • double_click - Double click at coordinates
  • right_click - Right click at coordinates
  • middle_click - Middle click at coordinates
  • scroll - Scroll mouse wheel
  • move_mouse - Move mouse to coordinates
  • mouse_press - Press mouse button
  • mouse_release - Release mouse button
  • get_cursor_position - Get cursor position
  • write - Type text
  • press - Press keys
  • drag - Drag mouse from one position to another

MCP Server Configuration

# Advanced MCP server configuration
server = MCPRemoteServer(
    os_type="Ubuntu",
    provider_type="docker",
    headless=True,
    resolution=(1920, 1080),
    disk_size="32G",
    ram_size="4G",
    cpu_cores="4",
    session_password="your_password",
    stream_server=True,
    dpi=96,
    timeout=1000
)

Sandbox Instantiation

Basic Configuration

from screenenv import Sandbox

# Minimal configuration
sandbox = Sandbox()

# With custom settings
sandbox = Sandbox(
    os_type="Ubuntu",           # Currently only Ubuntu is supported
    provider_type="docker",     # Currently only Docker is supported
    headless=True,              # Run without VNC viewer
    screen_size="1920x1080",    # Desktop resolution
    volumes=[],                 # Docker volumes to mount
    auto_ssl=False             # Enable SSL for VNC (experimental)
)

Core Features

Mouse Control

# Click operations
sandbox.left_click(x=100, y=200)
sandbox.right_click(x=300, y=400)
sandbox.double_click(x=500, y=600)

# Mouse movement
sandbox.move_mouse(x=800, y=900)

# Drag and drop
sandbox.drag(fr=(100, 100), to=(200, 200))

# Scrolling
sandbox.scroll(direction="down", amount=3)

sandbox.mouse_release(button="left")

sandbox.mouse_press(button="left")
sandbox.mouse_release(button="left")

Keyboard Input

# Type text
sandbox.write("Hello, World!", delay_in_ms=50)

# Key combinations
sandbox.press(["Ctrl", "C"])  # Copy
sandbox.press(["Ctrl", "V"])  # Paste
sandbox.press(["Alt", "Tab"]) # Switch windows
sandbox.press("Enter")        # Single key

Application Management

# Launch applications
sandbox.launch("xfce4-terminal")
sandbox.launch("libreoffice --writer")
sandbox.open("https://www.google.com")

# Window management
windows = sandbox.get_application_windows("xfce4-terminal")
window_id = windows[0]
sandbox.activate_window(window_id)

window_id = sandbox.get_current_window_id() # get the current activate window id.
sandbox.window_size(window_id)
sandbox.get_window_title(window_id)
sandbox.close_window(window_id)

File Operations

# Upload files to sandbox
sandbox.upload_file_to_remote("local_file.txt", "/home/user/remote_file.txt")

# Download files from sandbox
sandbox.download_file_from_remote("/home/user/remote_file.txt", "local_file.txt")

# Download from URL
sandbox.download_url_file_to_remote("https://example.com/file.txt", "/home/user/file.txt")

Screenshots and Recording

# Start recording
sandbox.start_recording()

# Take screenshots
desktop_screenshot = sandbox.desktop_screenshot()

# Stop recording and save it locally to a file 'demo.mp4'
sandbox.end_recording("demo.mp4")

Terminal Operations

# Execute commands
response = sandbox.execute_command("ls -la")
print(response.output)

# Python commands
response = sandbox.execute_python_command("print('Hello')", ["os"])
print(response.output)

# Get terminal output
output = sandbox.get_terminal_output() # Only if a desktop terminal application is running. To get command output, use execute_command() instead.

Examples

Complete GUI Automation Demo

from screenenv import Sandbox
import time

def demo_automation():
    sandbox = Sandbox(headless=False)

    try:
        # Launch terminal
        sandbox.launch("xfce4-terminal")
        time.sleep(2)

        # Type commands
        sandbox.write("echo 'Starting automation demo'")
        sandbox.press("Enter")

        # Open web browser
        sandbox.open("https://www.python.org")
        time.sleep(3)

        # Take screenshot
        screenshot = sandbox.screenshot()
        with open("demo_screenshot.png", "wb") as f:
            f.write(screenshot)

    finally:
        sandbox.close()

if __name__ == "__main__":
    demo_automation()

Web Automation with Playwright

from screenenv import Sandbox

def web_automation():
    sandbox = Sandbox(headless=True)

    try:
        # Open website
        sandbox.open("https://www.example.com")

        # Take browser screenshot
        screenshot = sandbox.playwright_screenshot(full_page=True)
        with open("web_screenshot.png", "wb") as f:
            f.write(screenshot)

        playwright_browser = sandbox.playwright_browser()

    finally:
        sandbox.close()

Benefits

  • Single Entry Point: All services accessible through one port
  • Clean URLs: Organized by service type (/api, /novnc, /browser, /mcp)
  • Load Balancing Ready: Easy to add multiple backend instances

MCP Server Demo

python -m examples.mcp_server_demo # or sudo -E python -m examples.mcp_server_demo if not in docker group

Sandbox Demo

python -m examples.sandbox_demo # or sudo -E python -m examples.sandbox_demo if not in docker group

Computer Agent Demo

cd examples/computer_agent
python app.py # or sudo -E python app.py if not in docker group

System Requirements

  • Docker: Must be installed and running
  • Python: 3.10 or higher
  • Playwright: For web automation features
  • Memory: At least 4GB RAM recommended

Docker Image

The sandbox uses a custom Ubuntu 22.04 Docker image with:

  • XFCE4 desktop environment
  • VNC server for remote access
  • Google Chrome/Chromium browser
  • LibreOffice suite
  • Python development tools
  • MCP server support
  • Nginx reverse proxy

Troubleshooting

Common Issues

  1. Docker not running:

    # Start Docker service
    sudo systemctl start docker
    sudo -E python3 -m examples.sandbox_demo
    
  2. Docker image not found:

    # Build the image locally
    cd dockerfiles/desktop
    docker build -f Dockerfile.ubuntu_xfce4 -t amhma/ubuntu-desktop:22.04-0.0.1-dev .
    

Getting Help

  • Check the examples directory for working code samples
  • Review the MCP server documentation
  • Ensure all dependencies are installed: pip install -r requirements.txt
  • For Docker issues, verify Docker is running and has sufficient resources

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

screenenv-0.0.1.dev0.tar.gz (22.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

screenenv-0.0.1.dev0-py3-none-any.whl (25.8 kB view details)

Uploaded Python 3

File details

Details for the file screenenv-0.0.1.dev0.tar.gz.

File metadata

  • Download URL: screenenv-0.0.1.dev0.tar.gz
  • Upload date:
  • Size: 22.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for screenenv-0.0.1.dev0.tar.gz
Algorithm Hash digest
SHA256 978ab41fafe13826e517fe9390588717976e5ea2b08ffe9724492dff096ab381
MD5 27f5e2e430d082e43ad927c0fd79444c
BLAKE2b-256 3a3f49063283d487237a512404a7da55a46af78ddaa35cd31226db29c1c00c54

See more details on using hashes here.

File details

Details for the file screenenv-0.0.1.dev0-py3-none-any.whl.

File metadata

  • Download URL: screenenv-0.0.1.dev0-py3-none-any.whl
  • Upload date:
  • Size: 25.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for screenenv-0.0.1.dev0-py3-none-any.whl
Algorithm Hash digest
SHA256 039369f97c755d551ea14e3af682e25f6711e114c8151f6ade9dce9de4cb743c
MD5 3b93ce9f71af63f95b2d24e021515b32
BLAKE2b-256 02ec5407bff72113f1b9022e359242b7d5575e3d2c8de4c1e85824f921d4e913

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page