Skip to main content

A gesture-based screen capture and keyboard shortcut automation system using MediaPipe and FastAPI

Project description

Gesture Vision Logo

Build Tech License

Branch Version Status
master 0.3.0 passing
Platform Python System Dependencies
Linux (X11/Wayland) 3.12+ libgl1, libglib2.0-0, notify-send, xdotool, ydotool
macOS (ARM64) N/A N/A
Windows (x86_64) N/A N/A
  • Docs: [Link to documentation]

Introduction

Gesture Vision is a gesture control system, developed in Python to automate desktop actions through real-time gesture recognition.

The system works through a computer vision pipeline where MediaPipe processes the camera stream to detect hand landmarks, coordinated by an asynchronous FastAPI backend. This provides precise event control with minimal impact on system resources.

Architecture

Architecture

Main Features

  • High Precision Tracking: MediaPipe Hand Landmarker implementation for low-latency hand landmark detection.
  • Robust Trigger Control: Validation system based on gesture hold time and cooldown periods to eliminate false positives.
  • Swipe Gestures: Workspace switching with horizontal hand movements - left/right.
  • Screenshot Capture: Detection of the "peace" gesture (index and middle fingers up) to take screenshots.
  • Invisible Mode (Background): Fully background operation without preview windows, optimizing GPU/CPU usage.
  • Keyboard Shortcuts: Associate gestures with system shortcuts like ctrl+shift+q to execute any action.
  • X11 and Wayland Support: Automatically detects whether to use xdotool (X11) or ydotool (Wayland).
  • Native Notifications: Integration with desktop notification system to instantly confirm actions.
  • Native Deployment: Simplified installation via pip with dedicated CLI commands for service management.
  • Web Panel: Graphical interface to configure gestures, shortcuts, and timing.

Quick Start

1. Installation

# Install from PyPI
pip install gesturevision

2. Run

# First time: automatically install system dependencies
gesturevision-start --install-deps

# Start the system
gesturevision-start

4. Control Panel

Access the web panel: http://localhost:8080

There you can:

  • Start/stop the system
  • Configure gesture and cooldown timing
  • Add gestures and associate keyboard shortcuts

Available Gesture Types

Gesture Description
✌️ Peace Index and middle fingers up
Fist Closed hand
Open Hand 5 fingers extended
👍 Three Fingers Index, middle, and ring fingers up
☝️ Point Only index finger up

Supported Keyboard Shortcuts

You can use any of the predefined shortcuts or write your own:

  • Screenshot: Print
  • Close window: alt+F4
  • Minimize all: super+d
  • Lock screen: super+l
  • Switch window: alt+Tab
  • New tab: ctrl+t
  • Close tab: ctrl+w
  • Copy/Paste: ctrl+c / ctrl+v
  • And many more...

Project Structure

  • src/gesturevision/main_vision.py: Gesture detection and capture engine.
  • src/gesturevision/api/: FastAPI backend for control and configuration.
  • src/gesturevision/static/: Control panel user interface.
  • pyproject.toml: Package dependencies and entry points definition.
  • gesturevision/assets/hand_landmarker.task: Pre-trained AI model.

Configuration

Settings are stored in: ~/gesturevision/config.json

Structure:

{
  "gesture_hold_seconds": 1.0,
  "cooldown_seconds": 1.5,
  "gestures": [
    {
      "name": "Peace (screenshot)",
      "gesture_type": "peace_sign",
      "required_hold_seconds": 1.0,
      "enabled": true,
      "action": "screenshot",
      "shortcut": null
    }
  ]
}

Docker (Optional)

docker-compose up --build

Access the panel at: http://localhost:8080

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gesturevision-0.3.1.tar.gz (5.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gesturevision-0.3.1-py3-none-any.whl (5.9 MB view details)

Uploaded Python 3

File details

Details for the file gesturevision-0.3.1.tar.gz.

File metadata

  • Download URL: gesturevision-0.3.1.tar.gz
  • Upload date:
  • Size: 5.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.1.dev32+gd465cb093 CPython/3.12.3

File hashes

Hashes for gesturevision-0.3.1.tar.gz
Algorithm Hash digest
SHA256 ec7938ec808ff0f734837ece140bae5a14323b445ad28ae082d1a8397d2ea350
MD5 a298fceef26eebaabc48f988ac14565b
BLAKE2b-256 1d36862a91842db14921cec94b80b5692f51ef7d25b0184b2a948b0d73ff59c9

See more details on using hashes here.

File details

Details for the file gesturevision-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: gesturevision-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 5.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.1.dev32+gd465cb093 CPython/3.12.3

File hashes

Hashes for gesturevision-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 34a10640c68f2517d63f8f11e559687afb58ec2db92d8eddca3e1906ffbf4ad5
MD5 fccb4d2e240973f48e7a922287c84665
BLAKE2b-256 962d67aa1fde2702067fe86b64207c0e95ab5155f37567fb24b71a12cf71fe05

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page