A powerful Python library for creating and managing isolated desktop environments using Docker containers
Project description
ScreenEnv
A powerful Python library for creating and managing isolated desktop environments using Docker containers. ScreenEnv provides a sandboxed Ubuntu desktop environment with XFCE4 that you can programmatically control for GUI automation, testing, and development.
Features
- 🖥️ Isolated Desktop Environment: Full Ubuntu desktop with XFCE4 running in Docker
- 🎮 GUI Automation: Complete mouse and keyboard control
- 🌐 Web Automation: Built-in browser automation with Playwright
- 📹 Screen Recording: Capture video recordings of all actions
- 📸 Screenshot Capabilities: Desktop and browser screenshots
- 🖱️ Mouse Control: Click, drag, scroll, and mouse movement
- ⌨️ Keyboard Input: Text typing and key combinations
- 🪟 Window Management: Launch, activate, resize, and close applications
- 📁 File Operations: Upload, download, and file management
- 🐚 Terminal Access: Execute commands and capture output
- 🔒 Secure: Isolated environment with session-based authentication
- 🤖 MCP Server Support: Model Context Protocol integration for AI/LLM automation
- 🐳 Docker Ready: Pre-built Docker image with all dependencies
Quick Start
Installation
-
Clone the repository:
git clone <repository-url> cd screenenv
-
Install the package (choose one):
Using pip:
pip install .
Using uv:
uv sync
Basic Usage
from screenenv import Sandbox
# Create a sandbox environment
sandbox = Sandbox()
try:
# Launch a terminal
sandbox.launch("xfce4-terminal")
# Type some text
sandbox.write("echo 'Hello from ScreenEnv!'")
sandbox.press("Enter")
# Take a screenshot
screenshot = sandbox.screenshot()
with open("screenshot.png", "wb") as f:
f.write(screenshot)
finally:
# Clean up
sandbox.close()
For usage, see the source code in
examples/sandbox_demo.py
MCP Server Support
ScreenEnv includes full support for the Model Context Protocol (MCP), enabling seamless integration with AI/LLM systems for desktop automation.
What is MCP?
The Model Context Protocol (MCP) is a standard for AI assistants to interact with external tools and data sources. ScreenEnv's MCP server provides desktop automation capabilities that can be used by any MCP-compatible AI system.
MCP Server Features
- 30+ Automation Tools: Complete desktop control via MCP
- Streamable HTTP Transport: Efficient communication protocol
Starting the MCP Server
from screenenv import MCPRemoteServer
# Start MCP server
server = MCPRemoteServer()
print(f"MCP Server URL: {server.server_url}")
print(f"Server Configuration: {server.mcp_server_json}")
MCP Client Usage
import asyncio
from mcp import ClientSession
from mcp.client.streamable_http import streamablehttp_client
from screenenv import MCPRemoteServer
async def mcp_automation():
# Start MCP server
server = MCPRemoteServer(headless=False)
try:
# Connect to MCP server
async with streamablehttp_client(server.server_url) as (
read_stream, write_stream, _
):
async with ClientSession(read_stream, write_stream) as session:
await session.initialize()
# Launch terminal
await session.call_tool("launch", {
"application": "xfce4-terminal",
"wait_for_window": True
})
# Type commands
await session.call_tool("write", {"text": "echo 'Hello MCP!'"})
await session.call_tool("press", {"key": ["Enter"]})
# Take screenshot
response = await session.call_tool("screenshot", {})
screenshot_base64 = response.content[0].data
screenshot_bytes = base64.b64decode(screenshot_base64)
image = Image.open(io.BytesIO(screenshot_bytes))
image.save("screenshot.png")
...
print("MCP automation completed!")
finally:
server.close()
# Run the automation
asyncio.run(mcp_automation())
Available MCP Tools
System Operations
execute_command- Execute shell commandsget_platform- Get system platform informationget_screen_size- Get screen dimensionsget_desktop_path- Get desktop directory pathget_directory_tree- List directory contentsget_file- Get file contentsdownload_file- Download file from URLstart_recording- Start screen recordingend_recording- End screen recording
Application Management
wait- Wait for specified millisecondsopen- Open file or URLlaunch- Launch applicationget_current_window_id- Get current window IDget_application_windows- Get windows for applicationget_window_name- Get window name/titleget_window_size- Get window sizeactivate_window- Activate windowclose_window- Close windowget_terminal_output- Get terminal output
GUI Automation
screenshot- Take screenshotleft_click- Left click at coordinatesdouble_click- Double click at coordinatesright_click- Right click at coordinatesmiddle_click- Middle click at coordinatesscroll- Scroll mouse wheelmove_mouse- Move mouse to coordinatesmouse_press- Press mouse buttonmouse_release- Release mouse buttonget_cursor_position- Get cursor positionwrite- Type textpress- Press keysdrag- Drag mouse from one position to another
MCP Server Configuration
# Advanced MCP server configuration
server = MCPRemoteServer(
os_type="Ubuntu",
provider_type="docker",
headless=True,
resolution=(1920, 1080),
disk_size="32G",
ram_size="4G",
cpu_cores="4",
session_password="your_password",
stream_server=True,
dpi=96,
timeout=1000
)
Sandbox Instantiation
Basic Configuration
from screenenv import Sandbox
# Minimal configuration
sandbox = Sandbox()
# With custom settings
sandbox = Sandbox(
os_type="Ubuntu", # Currently only Ubuntu is supported
provider_type="docker", # Currently only Docker is supported
headless=True, # Run without VNC viewer
screen_size="1920x1080", # Desktop resolution
volumes=[], # Docker volumes to mount
auto_ssl=False # Enable SSL for VNC (experimental)
)
Core Features
Mouse Control
# Click operations
sandbox.left_click(x=100, y=200)
sandbox.right_click(x=300, y=400)
sandbox.double_click(x=500, y=600)
# Mouse movement
sandbox.move_mouse(x=800, y=900)
# Drag and drop
sandbox.drag(fr=(100, 100), to=(200, 200))
# Scrolling
sandbox.scroll(direction="down", amount=3)
sandbox.mouse_release(button="left")
sandbox.mouse_press(button="left")
sandbox.mouse_release(button="left")
Keyboard Input
# Type text
sandbox.write("Hello, World!", delay_in_ms=50)
# Key combinations
sandbox.press(["Ctrl", "C"]) # Copy
sandbox.press(["Ctrl", "V"]) # Paste
sandbox.press(["Alt", "Tab"]) # Switch windows
sandbox.press("Enter") # Single key
Application Management
# Launch applications
sandbox.launch("xfce4-terminal")
sandbox.launch("libreoffice --writer")
sandbox.open("https://www.google.com")
# Window management
windows = sandbox.get_application_windows("xfce4-terminal")
window_id = windows[0]
sandbox.activate_window(window_id)
window_id = sandbox.get_current_window_id() # get the current activate window id.
sandbox.window_size(window_id)
sandbox.get_window_title(window_id)
sandbox.close_window(window_id)
File Operations
# Upload files to sandbox
sandbox.upload_file_to_remote("local_file.txt", "/home/user/remote_file.txt")
# Download files from sandbox
sandbox.download_file_from_remote("/home/user/remote_file.txt", "local_file.txt")
# Download from URL
sandbox.download_url_file_to_remote("https://example.com/file.txt", "/home/user/file.txt")
Screenshots and Recording
# Start recording
sandbox.start_recording()
# Take screenshots
desktop_screenshot = sandbox.desktop_screenshot()
# Stop recording and save it locally to a file 'demo.mp4'
sandbox.end_recording("demo.mp4")
Terminal Operations
# Execute commands
response = sandbox.execute_command("ls -la")
print(response.output)
# Python commands
response = sandbox.execute_python_command("print('Hello')", ["os"])
print(response.output)
# Get terminal output
output = sandbox.get_terminal_output() # Only if a desktop terminal application is running. To get command output, use execute_command() instead.
Examples
Complete GUI Automation Demo
from screenenv import Sandbox
import time
def demo_automation():
sandbox = Sandbox(headless=False)
try:
# Launch terminal
sandbox.launch("xfce4-terminal")
time.sleep(2)
# Type commands
sandbox.write("echo 'Starting automation demo'")
sandbox.press("Enter")
# Open web browser
sandbox.open("https://www.python.org")
time.sleep(3)
# Take screenshot
screenshot = sandbox.screenshot()
with open("demo_screenshot.png", "wb") as f:
f.write(screenshot)
finally:
sandbox.close()
if __name__ == "__main__":
demo_automation()
Web Automation with Playwright
from screenenv import Sandbox
def web_automation():
sandbox = Sandbox(headless=True)
try:
# Open website
sandbox.open("https://www.example.com")
# Take browser screenshot
screenshot = sandbox.playwright_screenshot(full_page=True)
with open("web_screenshot.png", "wb") as f:
f.write(screenshot)
playwright_browser = sandbox.playwright_browser()
finally:
sandbox.close()
Benefits
- Single Entry Point: All services accessible through one port
- Clean URLs: Organized by service type (
/api,/novnc,/browser,/mcp) - Load Balancing Ready: Easy to add multiple backend instances
MCP Server Demo
python -m examples.mcp_server_demo # or sudo -E python -m examples.mcp_server_demo if not in docker group
Sandbox Demo
python -m examples.sandbox_demo # or sudo -E python -m examples.sandbox_demo if not in docker group
Computer Agent Demo
cd examples/computer_agent
python app.py # or sudo -E python app.py if not in docker group
System Requirements
- Docker: Must be installed and running
- Python: 3.10 or higher
- Playwright: For web automation features
- Memory: At least 4GB RAM recommended
Docker Image
The sandbox uses a custom Ubuntu 22.04 Docker image with:
- XFCE4 desktop environment
- VNC server for remote access
- Google Chrome/Chromium browser
- LibreOffice suite
- Python development tools
- MCP server support
- Nginx reverse proxy
Troubleshooting
Common Issues
-
Docker not running:
# Start Docker service sudo systemctl start docker sudo -E python3 -m examples.sandbox_demo
-
Docker image not found:
# Build the image locally cd dockerfiles/desktop docker build -f Dockerfile.ubuntu_xfce4 -t amhma/ubuntu-desktop:22.04-0.0.1-dev .
Getting Help
- Check the examples directory for working code samples
- Review the MCP server documentation
- Ensure all dependencies are installed:
pip install -r requirements.txt - For Docker issues, verify Docker is running and has sufficient resources
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file screenenv-0.0.1.dev0.tar.gz.
File metadata
- Download URL: screenenv-0.0.1.dev0.tar.gz
- Upload date:
- Size: 22.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
978ab41fafe13826e517fe9390588717976e5ea2b08ffe9724492dff096ab381
|
|
| MD5 |
27f5e2e430d082e43ad927c0fd79444c
|
|
| BLAKE2b-256 |
3a3f49063283d487237a512404a7da55a46af78ddaa35cd31226db29c1c00c54
|
File details
Details for the file screenenv-0.0.1.dev0-py3-none-any.whl.
File metadata
- Download URL: screenenv-0.0.1.dev0-py3-none-any.whl
- Upload date:
- Size: 25.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
039369f97c755d551ea14e3af682e25f6711e114c8151f6ade9dce9de4cb743c
|
|
| MD5 |
3b93ce9f71af63f95b2d24e021515b32
|
|
| BLAKE2b-256 |
02ec5407bff72113f1b9022e359242b7d5575e3d2c8de4c1e85824f921d4e913
|