Skip to main content

A gesture-based screen capture and keyboard shortcut automation system using MediaPipe and FastAPI

Project description

Gesture Vision Logo

Build Tech License

Branch Version Status
master 0.3.0 passing
Platform Python System Dependencies
Linux (X11/Wayland) 3.12+ libgl1, libglib2.0-0, notify-send, xdotool, ydotool
macOS (ARM64) N/A N/A
Windows (x86_64) N/A N/A
  • Docs: [Link to documentation]

Introduction

Gesture Vision is a gesture control system, developed in Python to automate desktop actions through real-time gesture recognition.

The system works through a computer vision pipeline where MediaPipe processes the camera stream to detect hand landmarks, coordinated by an asynchronous FastAPI backend. This provides precise event control with minimal impact on system resources.

Architecture

Architecture

Main Features

  • High Precision Tracking: MediaPipe Hand Landmarker implementation for low-latency hand landmark detection.
  • Robust Trigger Control: Validation system based on gesture hold time and cooldown periods to eliminate false positives.
  • Swipe Gestures: Workspace switching with horizontal hand movements - left/right.
  • Screenshot Capture: Detection of the "peace" gesture (index and middle fingers up) to take screenshots.
  • Invisible Mode (Background): Fully background operation without preview windows, optimizing GPU/CPU usage.
  • Keyboard Shortcuts: Associate gestures with system shortcuts like ctrl+shift+q to execute any action.
  • X11 and Wayland Support: Automatically detects whether to use xdotool (X11) or ydotool (Wayland).
  • Native Notifications: Integration with desktop notification system to instantly confirm actions.
  • Native Deployment: Simplified installation via pip with dedicated CLI commands for service management.
  • Web Panel: Graphical interface to configure gestures, shortcuts, and timing.

Quick Start

1. Installation

# Install from PyPI
pip install gesturevision

2. Run

# First time: automatically install system dependencies
gesturevision-start --install-deps

# Start the system
gesturevision-start

4. Control Panel

Access the web panel: http://localhost:8080

There you can:

  • Start/stop the system
  • Configure gesture and cooldown timing
  • Add gestures and associate keyboard shortcuts

Available Gesture Types

Gesture Description
✌️ Peace Index and middle fingers up
Fist Closed hand
Open Hand 5 fingers extended
👍 Three Fingers Index, middle, and ring fingers up
☝️ Point Only index finger up

Supported Keyboard Shortcuts

You can use any of the predefined shortcuts or write your own:

  • Screenshot: Print
  • Close window: alt+F4
  • Minimize all: super+d
  • Lock screen: super+l
  • Switch window: alt+Tab
  • New tab: ctrl+t
  • Close tab: ctrl+w
  • Copy/Paste: ctrl+c / ctrl+v
  • And many more...

Project Structure

  • src/gesturevision/main_vision.py: Gesture detection and capture engine.
  • src/gesturevision/api/: FastAPI backend for control and configuration.
  • src/gesturevision/static/: Control panel user interface.
  • pyproject.toml: Package dependencies and entry points definition.
  • gesturevision/assets/hand_landmarker.task: Pre-trained AI model.

Configuration

Settings are stored in: ~/gesturevision/config.json

Structure:

{
  "gesture_hold_seconds": 1.0,
  "cooldown_seconds": 1.5,
  "gestures": [
    {
      "name": "Peace (screenshot)",
      "gesture_type": "peace_sign",
      "required_hold_seconds": 1.0,
      "enabled": true,
      "action": "screenshot",
      "shortcut": null
    }
  ]
}

Docker (Optional)

docker-compose up --build

Access the panel at: http://localhost:8080

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gesturevision-0.3.2.tar.gz (5.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gesturevision-0.3.2-py3-none-any.whl (5.9 MB view details)

Uploaded Python 3

File details

Details for the file gesturevision-0.3.2.tar.gz.

File metadata

  • Download URL: gesturevision-0.3.2.tar.gz
  • Upload date:
  • Size: 5.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for gesturevision-0.3.2.tar.gz
Algorithm Hash digest
SHA256 9ca604c3bf0c8a84c305a2fa77e6df5beb1fb5777ad152f08a3c20f20d811e27
MD5 5a1072b902f054ef6e229886fddc1ffb
BLAKE2b-256 dce50d0666e960bae0792ee009e376e5aae592ec96495dc9bbfe5f445862626a

See more details on using hashes here.

File details

Details for the file gesturevision-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: gesturevision-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 5.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for gesturevision-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ee6135e89b4bc9b3063c73873b6d7f60abe46fd230df5611363874f2ceead0cf
MD5 0fd76f7aac5c09085a5202575db2e604
BLAKE2b-256 e4df5515026e4d9bf3d4d4e17551ecbe907a84bba917db3cb3f7a988ce50ae5c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page