Utilities and models for ASL alphabet training used in the SmartGlasses project

These details have not been verified by PyPI

Project links

Project description

Smart Gestures

A Python package for sign language alphabet recognition using PyTorch and MediaPipe hand tracking.

Overview

Smart Gestures is a comprehensive toolkit for building sign language recognition systems. It provides pre-trained models, training utilities, and production-ready inference capabilities for recognizing hand gestures from MediaPipe landmarks. The package is designed to be easy to integrate into existing applications while providing flexibility for researchers and developers who want to train custom models.

The package supports multiple sign language alphabets and includes battle-tested utilities for data preprocessing, augmentation, model training with advanced callbacks (early stopping, learning rate scheduling, checkpointing), and real-time inference. Whether you're building a web API, a mobile app backend, or conducting research, Smart Gestures provides the tools you need.

What's Inside

Smart Gestures provides three main components:

ASL Model - American Sign Language alphabet recognition with a simple feedforward neural network trained on 21-landmark hand poses. Includes complete data loading, training, and inference utilities with CSV-based dataset support.

VGT Model - Vlaamse Gebarentaal (Flemish Sign Language) alphabet recognition with advanced normalization techniques and data augmentation. Features sophisticated training callbacks including early stopping, learning rate scheduling, and model checkpointing for optimal performance.

LSTM Model - Experimental sequence-based gesture recognition using LSTM networks for temporal gesture patterns. Supports dynamic gesture recognition beyond static alphabet poses.

Features

Pre-trained Models - Ready-to-use ASL and VGT alphabet recognition models with high accuracy rates, optimized for production deployment.

Data Loading & Preprocessing - Flexible data loaders supporting CSV and JSON formats with built-in normalization, augmentation, and batching.

Training Utilities - Complete training pipeline with advanced callbacks including early stopping, model checkpointing, learning rate scheduling (step decay, plateau), and progress tracking.

Model Architecture - Lightweight feedforward neural networks optimized for real-time inference with dropout regularization and configurable layer sizes.

Data Augmentation - Built-in augmentation techniques including rotation, Gaussian noise, and coordinate scaling to improve model robustness.

Production Ready - Easy integration with web frameworks (FastAPI, Flask), designed for REST APIs and real-time applications.

Real-time Inference - Optimized for low-latency predictions from MediaPipe hand landmarks with support for both CPU and GPU inference.

Flexible Dataset Support - Works with custom datasets in standardized formats, includes tools for dataset creation and validation.

Installation

Install Smart Gestures using pip. The package requires Python 3.9+ and will automatically install all necessary dependencies including PyTorch, MediaPipe, and NumPy.

From PyPI (Recommended)

pip install smart-gestures

From Source

pip install git+https://github.com/vives-project-xp/SmartGlasses.git#subdirectory=notebooks/package

Development Installation

git clone https://github.com/vives-project-xp/SmartGlasses.git
cd SmartGlasses/notebooks/package
pip install -e .

Requirements

Python 3.9 - 3.12
PyTorch 2.7.0+
MediaPipe 0.10.21
NumPy 1.26.4
Pandas 2.3.3
tqdm 4.67.1

All dependencies are automatically installed with the package.

Quick Start

Get up and running with Smart Gestures in minutes. This section shows you how to load a pre-trained model and make predictions from hand landmarks.

Basic Usage

ASL Model

from smart_gestures.alphabet.asl_model import ASLModel, get_classes

# Load classes
classes = get_classes()
# Create model
model = ASLModel()
# Make a prediction
predicted_letter = model.predict(input_tensor)
print(f"Predicted sign: {predicted_letter}")

VGT Model

from smart_gestures.alphabet.vgt_model import VGTModel, get_classes

# Load classes
classes = get_classes()
# Create model
model = VGTModel()
# Make a prediction
predicted_letter = model.predict(input_tensor)
print(f"Predicted sign: {predicted_letter}")

LSTM Model

from smart_gestures.gestures.lstm_model import LSTMModel, get_classes
# Load classes
classes = get_classes()
# Create model
model = LSTMModel()
# Make a prediction
predicted_gesture = model.predict(input_sequence)
print(f"Predicted gesture: {predicted_gesture}")

Package Structure

Understanding the package structure helps you navigate the codebase and extend functionality:

smart_gestures/
├── __init__.py                     # Main package entry point
├── alphabet/                       # Alphabet recognition models
│   ├── __init__.py
│   ├── asl_model/                  # American Sign Language
│   │   ├── __init__.py             # Exports: get_classes, ASLModel class
|   │   ├── model.py                # Script defining the ASLModel architecture and class
│   │   ├── data/                   # Dataset storage
│   │   │   └── hand_landmarks.csv  # Training data
│   │   └── models/                 # Pre-trained model
|   │       └── asl_model.pth
│   └── vgt_model/                  # Flemish Sign Language
│       ├── __init__.py             # Exports: get_classes, VGTModel class
│       ├── model.py                # Script defining the VGTModel architecture and class
│       ├── data/                   # Processed dataset storage
│       │   └── hand_landmarks.json # Training data
│       └── models/                 # Pre-trained model
│           └── vgt_model.pth       
└── gestures/                       # Dynamic gesture recognition
    └── lstm_model/                 # LSTM-based sequence models (experimental)
        ├── __init__.py             # Exports: get_classes, LSTMModel class
        ├── model.py                # Script defining the LSTMModel architecture and class
        ├── data/                   # Dataset storage
        │   └── gesture_map.json    # Training data
        └── models/                 # Pre-trained model
             └── lstm_model.pth

Data Format & Requirements

Smart Gestures works with MediaPipe hand landmarks for the alphabet recognition models (ASL and VGT) and sequences of hand and body landmarks for the LSTM model.

Hand Landmark Structure

MediaPipe provides 21 landmarks per hand:

0: Wrist
1-4: Thumb (CMC, MCP, IP, Tip)
5-8: Index finger (MCP, PIP, DIP, Tip)
9-12: Middle finger (MCP, PIP, DIP, Tip)
13-16: Ring finger (MCP, PIP, DIP, Tip)
17-20: Pinky (MCP, PIP, DIP, Tip)

Body Landmark Structure (for LSTM)

MediaPipe provides 33 body landmarks:

0: Nose
1-10: Eyes, Ears, Mouth
11-22: Shoulders, Elbows, Wrists, Hands
23-32: Hips, Knees, Ankles, Feet

Landmark Coordinates

Each landmark has three coordinates:

x: Horizontal position (normalized 0-1)
y: Vertical position (normalized 0-1)
z: Depth position (relative to the camera)

Input Format

The ASL and VGT models expect input as a list of 21 landmarks, each represented as a dictionary with x, y, z keys:

# List of dictionaries (from MediaPipe)
landmarks = [
    {"x": 0.5, "y": 0.3, "z": 0.1},
    {"x": 0.6, "y": 0.4, "z": 0.2},
    # ... 21 landmarks total
]

The LSTM model expects a sequence of such landmark lists for dynamic gesture recognition.

# List of frames, each containing 258 landmarks
sequence = [
    [ {"x": 0.5, "y": 0.3, "z": 0.1}, ... ],  # Frame 1
    [ {"x": 0.6, "y": 0.4, "z": 0.2}, ... ],  # Frame 2
    # ... more frames
]

FastAPI Integration

Smart Gestures is designed for easy integration with modern web frameworks. Here's a complete example of building a REST API with FastAPI for real-time sign language recognition.

Complete FastAPI Example

from fastapi import FastAPI, HTTPException
from smart_gestures.alphabet.asl_model import ASLModel, get_classes
from pydantic import BaseModel
from schemas import ClassesResponse, PredictBody, PredictResponse

app = FastAPI()
# Load ASL model
classes = get_classes()
model = ASLModel()

class PredictionRequest(BaseModel):
    landmarks: list[dict]  # List of landmarks with x, y, z keys

@app.post("/predict")
async def predict(body: PredictBody) -> PredictResponse:
    landmarks = [landmark.model_dump() for landmark in body.landmarks]
    if len(landmarks) != 21:
        raise HTTPException(status_code=400, detail="Expected 21 landmarks.")
    try:
        prediction, confidence = model.predict(landmarks)
        return PredictResponse(prediction=prediction, confidence=confidence)
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.get("/classes", response_model=ClassesResponse)
async def get_classes_endpoint():
    return ClassesResponse(classes=classes)

Using the API

import requests

# Single prediction
response = requests.post(
    "http://localhost:8000/predict",
    json={
        "landmarks": [
            {"x": 0.5, "y": 0.3, "z": 0.1},
            {"x": 0.6, "y": 0.4, "z": 0.2},
            # ... 21 landmarks total
        ]
    }
)

result = response.json()
print(f"Predicted: {result['prediction']}")
print(f"Confidence: {result['confidence']:.2%}")

VGT Model with FastAPI

from fastapi import FastAPI, HTTPException
from smart_gestures.alphabet.asl_model import ASLModel, get_classes
from pydantic import BaseModel
from schemas import ClassesResponse, PredictBody, PredictResponse

app = FastAPI()
# Load VGT model
classes = get_classes()
model = VGTModel()

class PredictionRequest(BaseModel):
    landmarks: list[dict]  # List of landmarks with x, y, z keys

@app.post("/predict")
async def predict(body: PredictBody) -> PredictResponse:
    landmarks = [landmark.model_dump() for landmark in body.landmarks]
    if len(landmarks) != 21:
        raise HTTPException(status_code=400, detail="Expected 21 landmarks.")
    try:
        prediction, confidence = model.predict(landmarks)
        return PredictResponse(prediction=prediction, confidence=confidence)
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.get("/classes", response_model=ClassesResponse)
async def get_classes_endpoint():
    return ClassesResponse(classes=classes)

LSTM Model with FastAPI

from fastapi import FastAPI, HTTPException
from smart_gestures.gestures.lstm_model import LSTMModel, get_classes
from pydantic import BaseModel
from schemas import ClassesResponse, PredictBody, PredictResponse

app = FastAPI()
# Load LSTM model
classes = get_classes()
model = LSTMModel()

class PredictionRequest(BaseModel):
    sequence: list[list[dict]]  # List of frames, each with landmarks

@app.post("/predict")
async def predict(body: PredictBody) -> PredictResponse:
    sequence = [
        [landmark.model_dump() for landmark in frame.landmarks]
        for frame in body.sequence
    ]
    try:
        prediction, confidence = model.predict(sequence)
        return PredictResponse(prediction=prediction, confidence=confidence)
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.get("/classes", response_model=ClassesResponse)
async def get_classes_endpoint():
    return ClassesResponse(classes=classes)

Model Architecture

Feedforward Neural Network

Both ASL and VGT models use a compact feedforward neural network optimized for real-time inference:

Input Layer: 63 features
├─ 21 hand landmarks × 3 coordinates (x, y, z)
│
Hidden Layer 1: 128 neurons
├─ Linear transformation (63 → 128)
├─ ReLU activation
└─ Dropout (p=0.3) for regularization
│
Hidden Layer 2: 64 neurons
├─ Linear transformation (128 → 64)
├─ ReLU activation
└─ Dropout (p=0.3) for regularization
│
Output Layer: num_classes neurons
├─ Linear transformation (64 → num_classes)
└─ Raw logits (apply softmax for probabilities)

Key Features:

Lightweight: ~10K parameters for fast inference
Regularization: Dropout prevents overfitting
Flexible: Configurable input dimensions and class count
Efficient: Optimized for CPU and GPU execution

PyTorch Implementation:

import torch.nn as nn

class HandGestureModel(nn.Module):
    def __init__(self, in_dim=63, num_classes=26):
        super().__init__()
        self.model = nn.Sequential(
            nn.Linear(in_dim, 128),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(64, num_classes)
        )
    
    def forward(self, x):
        return self.model(x)

LSTM Network

The LSTM model is designed for sequence-based gesture recognition:

├─ Each frame: 258 features (21 hand + 33 body landmarks × 3 coordinates)
│LSTM Layer: 128 hidden units
├─ Processes input sequences
│LSTM Layer: 128 hidden units
├─ Processes input sequences
└─ Output Layer: num_classes neurons
 └─ Raw logits (apply softmax for probabilities)

Key Features:

Temporal Modeling: Captures sequential patterns in gestures
Scalable: Handles variable-length input sequences
Robust: Suitable for dynamic gesture recognition tasks

PyTorch Implementation:

import torch.nn as nn
class GestureLSTMModel(nn.Module):
    def __init__(self, input_size=258, hidden_size=128, num_classes=10, num_layers=2):
        super().__init__()
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, num_classes)
    
    def forward(self, x):
        h_lstm, _ = self.lstm(x)
        out = self.fc(h_lstm[:, -1, :])  # Use the last time step
        return out

Performance Benchmarks

Model	Classes	Accuracy	Parameters	Inference Time*
ASL	26	~95%	~10K	<5ms
VGT	26	~93%	~10K	<5ms
LSTM	Custom	Varies	~50K	<10ms

*CPU inference time on Intel i7. GPU inference is typically <1ms.

License

GNU General Public License v3.0 or later - see the LICENSE file for details.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

Authors

Simon Stijnen
Lynn Deleare
Olivier Westerman

Maintained by VIVES University of Applied Sciences - Project XP

Acknowledgments

MediaPipe for hand tracking
PyTorch for the deep learning framework
VIVES University of Applied Sciences for supporting this project

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.3

Nov 29, 2025

0.3.2

Nov 27, 2025

0.3.1

Nov 27, 2025

0.3.0

Nov 24, 2025

0.2.3

Nov 20, 2025

0.2.2 yanked

Nov 17, 2025

Reason this release was yanked:

Non python files aren't included in the package. Therefore the package isn't working as expected.

0.2.0 yanked

Nov 14, 2025

Reason this release was yanked:

Non python files aren't included in the package. Therefore the package isn't working as expected.

0.1.1 yanked

Nov 12, 2025

Reason this release was yanked:

Non python files aren't included in the package. Therefore the package isn't working as expected.

0.1.0 yanked

Nov 12, 2025

Reason this release was yanked:

Non python files aren't included in the package. Therefore the package isn't working as expected.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smart_gestures-0.3.3.tar.gz (1.9 MB view details)

Uploaded Nov 29, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

smart_gestures-0.3.3-py3-none-any.whl (1.9 MB view details)

Uploaded Nov 29, 2025 Python 3

File details

Details for the file smart_gestures-0.3.3.tar.gz.

File metadata

Download URL: smart_gestures-0.3.3.tar.gz
Upload date: Nov 29, 2025
Size: 1.9 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.13 {"installer":{"name":"uv","version":"0.9.13"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for smart_gestures-0.3.3.tar.gz
Algorithm	Hash digest
SHA256	`afb6d2f16a13d14ca0770f61dadb8e0fbf5a7ea29c45a79a3909fa600b512b26`
MD5	`44ecd375cd2b429582dff9ec421944b4`
BLAKE2b-256	`514bed04de4d23acd17974f1d9805c10f1faf32030288e5fce07eab35ad33283`

See more details on using hashes here.

File details

Details for the file smart_gestures-0.3.3-py3-none-any.whl.

File metadata

Download URL: smart_gestures-0.3.3-py3-none-any.whl
Upload date: Nov 29, 2025
Size: 1.9 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.13 {"installer":{"name":"uv","version":"0.9.13"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for smart_gestures-0.3.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0a6c838a3a89bafb6de7012e56edddec47be48aef679447bddbd4790e0158a41`
MD5	`8a436a202475d5a2f7db905173450173`
BLAKE2b-256	`f1936f6135d7f5a00193884d6a3e5d657555bfb693897f037f5bbd6c78699b92`

See more details on using hashes here.

smart_gestures 0.3.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Smart Gestures

Overview

What's Inside

Features

Installation

From PyPI (Recommended)

From Source

Development Installation

Requirements

Quick Start

Basic Usage

ASL Model

VGT Model

LSTM Model

Package Structure

Data Format & Requirements

Hand Landmark Structure

Body Landmark Structure (for LSTM)

Landmark Coordinates

Input Format

FastAPI Integration

Complete FastAPI Example

Using the API

VGT Model with FastAPI

LSTM Model with FastAPI

Model Architecture

Feedforward Neural Network

LSTM Network

Performance Benchmarks

License

Authors

Links

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes