Skip to main content

Like Selenium, but for your entire screen. Find and interact with text anywhere using OCR.

Project description

🎯 Screenium

Screenium is like Selenium after discovering there's a whole world outside the browser. Find and click any text on your screen using OCR. That "Accept Cookies" popup for the millionth time? Banish it.

Demo

🔍 Installation

pip install screenium

or

uv add screenium

✨ Use Cases

  • Auto-Accept Cookies: Run a background task that automatically clicks "Accept Cookies" popups
  • Meeting Assistant: Auto-join video calls by detecting "Join Meeting" buttons and automatically enable/disable your camera/mic based on meeting names
  • Game Automation: Automate repetitive game tasks by detecting and clicking on in-game text (e.g., "Collect Rewards", "Start Battle")
  • System Dialog Monitor: Handling system or application dialogs, like "Are you sure?" or "Run command," in Cursor based on set rules.
  • Multi-Browser Testing: Perform UI tests on browsers that lack automation APIs or web drivers with no separate web driver installations needed.

🔍 Finding Text

The most basic way to find text is:

from screenium import Text

# Simple text matching
text = Text("Login")  # Finds "Login" anywhere on screen
text = Text("LOG")    # Matches the first of "LOGIN", "LOGOUT", etc.

# Exact matching
text = Text("Login", exact_match=True)  # Only matches "Login" exactly
text = Text("login", case_sensitive=True)  # Case-sensitive matching

📍 Spatial Relationships

You can find text based on its position relative to other text:

# Basic relationships
Text("Password").below("Username")  # Find "Password" below "Username"
Text("Cancel").left_of("Submit")    # Find "Cancel" left of "Submit"
Text("Help").right_of("Back")       # Find "Help" right of "Back"
Text("Title").above("Content")      # Find "Title" above "Content"

# Chain relationships
Text("Save").below("Options").right_of("Cancel")

📐 Aligned Elements

For more precise positioning, use aligned to require elements to line up:

# Elements must be vertically aligned (same x-axis)
Text("Email").aligned.below("Username")

# Elements must be horizontally aligned (same y-axis)
Text("Back").aligned.left_of("Next")

🎨 Background Color Matching

Find text by its background color:

# Match text with specific background colors
Text("Error", background_color="red")
Text("Success", background_color="green")
Text("Info", background_color="#0088ff")

# Adjust color matching tolerance (0-100)
Text("Warning", background_color="yellow", background_tolerance=50)

🖱️ Mouse Actions

Interact with matched text:

# Basic mouse actions
Text("Submit").click()
Text("Options").right_click()
Text("Link").double_click()
Text("Button").mouse_move()  # Just move mouse without clicking

# Type text
Text("Username").click().typewrite("myuser")
Text("Password").click().typewrite("mypass")

# Special keys
Text("Terminal").click().typewrite(["command", "k"])  # Clear terminal

⏰ Waiting and Monitoring

Handle timing and background monitoring:

# Wait for specific duration
Text("Loading").wait(2).click()  # Wait 2 seconds then click

🔍 Inspecting Matches

Get detailed information about matches:

# Get all matches
matches = Text("Button").matches
for match in matches:
    print(f"Found at: ({match['x']}, {match['y']})")
    print(f"Size: {match['width']}x{match['height']}")
    print(f"Confidence: {match['confidence']}")

🎨 Visual Debugging

Draw boxes around matches to debug:

# Draw green box for 3 seconds
Text("Username").draw(duration=3, color="green")

# Chain with other operations
Text("Password").below("Username").draw(duration=2, color="red").click()

🎯 Tips & Best Practices

  1. Start with basic text matching before adding spatial relationships
  2. Use draw() to visually verify matches
  3. Adjust background color tolerance if color matching is too strict/loose
  4. Use aligned when elements should line up precisely
  5. Chain operations for more precise matching

⚠️ Limitations

  • macOS only (uses Apple Vision framework)
  • Requires macOS Monterey (12.x) or higher
  • Screen resolution and scaling can affect matching

🔐 Required macOS Permissions

Screenium needs two key macOS permissions to function. The app running your Python script (e.g., Terminal, VS Code, PyCharm) will need:

  1. Screen Recording

    • Open System Settings > Privacy & Security > Screen Recording
    • Toggle on your terminal app/IDE
    • Required for:
      • Finding text on screen using OCR
      • Background monitoring
      • Visual debugging with draw()
  2. Accessibility

    • Open System Settings > Privacy & Security > Accessibility
    • Toggle on your terminal app/IDE
    • Required for:
      • Mouse actions (click(), right_click(), etc.)
      • Keyboard input (typewrite(), hotkey())

💡 Tip: If running from different apps, each one needs separate permissions. For example, if you run scripts from both Terminal and VS Code, you'll need to grant permissions to both.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

screenium-0.2.0.tar.gz (55.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

screenium-0.2.0-py3-none-any.whl (18.0 kB view details)

Uploaded Python 3

File details

Details for the file screenium-0.2.0.tar.gz.

File metadata

  • Download URL: screenium-0.2.0.tar.gz
  • Upload date:
  • Size: 55.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.1

File hashes

Hashes for screenium-0.2.0.tar.gz
Algorithm Hash digest
SHA256 e5b423fc766badbac4ed96f4962b998cbde34894095952a20e339853c0050cc1
MD5 d3a12933b81043b5e13a1c7a87c7d07d
BLAKE2b-256 99f14ade834a59c75122eea26bd69283b6e741cee475b9fc5d3cf735e4f7f581

See more details on using hashes here.

File details

Details for the file screenium-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: screenium-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 18.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.1

File hashes

Hashes for screenium-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1cd2362058aa80e52e68381e94b4427d5836d18874a55e5aaa3647aa140709e4
MD5 0fec4f2dc3e1e0280bb8200a3de6abb9
BLAKE2b-256 2e347e1184e04c6d7e26057189e825483098f397343b1827981886640f10ca34

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page