Skip to main content

Utilities and models for ASL alphabet training used in the SmartGlasses project

Reason this release was yanked:

Non python files aren't included in the package. Therefore the package isn't working as expected.

Project description

Smart Gestures

Python PyTorch License: GPL v3

A Python package for sign language alphabet recognition using PyTorch and MediaPipe hand tracking.

Overview

Smart Gestures provides PyTorch-based models and utilities for training and deploying sign language alphabet recognition systems. The package includes:

  • ASL (American Sign Language) alphabet recognition
  • VGT (Vlaamse Gebarentaal / Flemish Sign Language) alphabet recognition
  • LSTM-based gesture recognition (experimental)

Features

  • ๐Ÿค– Pre-trained models for ASL and VGT alphabets
  • ๐Ÿ“Š Data loading utilities with built-in augmentation support
  • ๐ŸŽฏ Training utilities with callbacks (early stopping, model checkpoints, learning rate scheduling)
  • ๐Ÿ”ง Model utilities for creating, loading, and evaluating models
  • ๐Ÿ“ˆ Real-time inference support with MediaPipe hand landmarks
  • ๐ŸŽจ Data augmentation (rotation, noise, scaling)

Installation

From PyPI (Recommended)

pip install smart-gestures

From Source

pip install git+https://github.com/vives-project-xp/SmartGlasses.git#subdirectory=notebooks/package

Development Installation

git clone https://github.com/vives-project-xp/SmartGlasses.git
cd SmartGlasses/notebooks/package
pip install -e .

Requirements

  • Python 3.12+
  • PyTorch 2.9.0+
  • MediaPipe 0.10.14+
  • NumPy 2.3.4+
  • Pandas 2.3.3+
  • tqdm 4.67.1+

Quick Start

Import the Package

from smart_gestures.alphabet import asl_model, vgt_model

Get Available Classes

# ASL alphabet classes
asl_classes = asl_model.get_classes()
print(f"ASL classes: {asl_classes}")

# VGT alphabet classes
vgt_classes = vgt_model.get_classes()
print(f"VGT classes: {vgt_classes}")

Load a Pre-trained Model

import torch
from smart_gestures.alphabet.asl_model import create_model, load_model, get_classes, DEVICE

# Get classes
classes = get_classes()
num_classes = len(classes)

# Create model architecture
model = create_model(num_classes=num_classes, in_dim=63)

# Load weights
model_path = "path/to/hand_gesture_model.pth"
model.load_state_dict(torch.load(model_path, map_location=DEVICE))
model.eval()

Make Predictions

import numpy as np
import torch

# Prepare input: 21 landmarks with x, y, z coordinates
landmarks = np.random.rand(21, 3).astype(np.float32)  # Replace with actual landmarks
input_tensor = torch.from_numpy(landmarks.reshape(1, 63)).to(DEVICE)

# Predict
with torch.no_grad():
    logits = model(input_tensor)
    pred_idx = int(torch.argmax(logits, dim=1).item())
    predicted_class = classes[pred_idx]
    
print(f"Prediction: {predicted_class}")

Normalize Landmarks (VGT Model)

from smart_gestures.alphabet.vgt_model import normalize_landmarks

# Raw landmarks from MediaPipe (list of dicts with x, y, z)
raw_landmarks = [{"x": 0.5, "y": 0.3, "z": 0.1}, ...]  # 21 landmarks

# Normalize (wrist-to-middle finger scaling)
normalized = normalize_landmarks(raw_landmarks, method="wrist_to_middle")

Training a Model

ASL Model Training

from smart_gestures.alphabet.asl_model import (
    get_classes, 
    get_loaders,
    create_model,
    train_model,
    evaluate_model
)
from smart_gestures.alphabet.asl_model.data_utils import (
    load_and_preprocess_dataset,
    split_dataset,
    HAND_LANDMARKS_CSV
)
from smart_gestures.alphabet.asl_model.model_utils import save_model

# Load data
classes = get_classes()
dataset = load_and_preprocess_dataset(HAND_LANDMARKS_CSV)
train_dataset, val_dataset = split_dataset(dataset, val_ratio=0.2, random_seed=42)
train_loader, val_loader = get_loaders(train_dataset, val_dataset, batch_size=32)

# Create model
in_dim = 63  # 21 landmarks * 3 coordinates
num_classes = len(classes)
model = create_model(num_classes, in_dim)

# Train
train_model(model, train_loader, epochs=20, lr=1e-3)

# Evaluate
accuracy = evaluate_model(model, val_loader)
print(f"Validation Accuracy: {accuracy:.2f}%")

# Save
save_model(model, path="my_model.pth")

VGT Model Training (Advanced)

from smart_gestures.alphabet.vgt_model import (
    create_model,
    train_model,
    evaluate_model
)
from smart_gestures.alphabet.vgt_model.data_utils import (
    load_dataset_normalized,
    split_dataset,
    get_loaders,
    get_classes,
    HAND_LANDMARKS_JSON
)
from smart_gestures.alphabet.vgt_model.model_utils import save_model

# Load normalized dataset with augmentation
dataset = load_dataset_normalized(
    HAND_LANDMARKS_JSON,
    as_sequence=False,
    scale_method="wrist_to_middle",
    augment=True,
    augment_prob=0.5,
    noise_std=0.02,
    rotate_deg=15
)

train_dataset, val_dataset = split_dataset(dataset, val_ratio=0.2, random_seed=42)
train_loader, val_loader = get_loaders(train_dataset, val_dataset, batch_size=32)

# Create model
classes = get_classes()
model = create_model(num_classes=len(classes), in_dim=63)

# Train with callbacks
train_model(
    model,
    train_loader,
    val_loader=val_loader,
    epochs=50,
    lr=1e-3,
    scheduler_type="plateau",
    scheduler_kwargs={"factor": 0.5, "patience": 5},
    early_stopping_kwargs={"patience": 10, "min_delta": 0.001},
    checkpoint_kwargs={"filepath": "checkpoints/best_model.pth"}
)

# Evaluate
accuracy = evaluate_model(model, val_loader)
print(f"Validation Accuracy: {accuracy:.2f}%")

Command-Line Training

Both ASL and VGT models include command-line training scripts:

ASL Training

python -m smart_gestures.alphabet.asl_model.run_training \
    --batch_size 32 \
    --epochs 20 \
    --lr 0.001 \
    --output models/hand_gesture_model.pth

VGT Training

python -m smart_gestures.alphabet.vgt_model.run_training \
    --batch_size 32 \
    --epochs 50 \
    --lr 0.001 \
    --augment \
    --augment_prob 0.5 \
    --scheduler plateau \
    --early_stopping \
    --output models/hand_gesture_model.pth

Package Structure

smart_gestures/
โ”œโ”€โ”€ __init__.py
โ”œโ”€โ”€ alphabet/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ const.py
โ”‚   โ”œโ”€โ”€ asl_model/
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ data_utils.py      # Data loading and preprocessing
โ”‚   โ”‚   โ”œโ”€โ”€ model_utils.py     # Model architecture and utilities
โ”‚   โ”‚   โ”œโ”€โ”€ train_utils.py     # Training and evaluation functions
โ”‚   โ”‚   โ”œโ”€โ”€ run_training.py    # CLI training script
โ”‚   โ”‚   โ”œโ”€โ”€ data/              # Dataset files
โ”‚   โ”‚   โ””โ”€โ”€ models/            # Saved model checkpoints
โ”‚   โ””โ”€โ”€ vgt_model/
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ”œโ”€โ”€ data_utils.py      # Data loading with normalization
โ”‚       โ”œโ”€โ”€ model_utils.py     # Model architecture
โ”‚       โ”œโ”€โ”€ train_utils.py     # Training with callbacks
โ”‚       โ”œโ”€โ”€ callbacks.py       # Training callbacks
โ”‚       โ”œโ”€โ”€ run_training.py    # CLI training script
โ”‚       โ”œโ”€โ”€ test_camera.py     # Real-time testing utility
โ”‚       โ”œโ”€โ”€ data/              # Dataset files
โ”‚       โ”œโ”€โ”€ dataset/           # Raw dataset
โ”‚       โ””โ”€โ”€ models/            # Saved model checkpoints
โ””โ”€โ”€ gestures/
    โ””โ”€โ”€ lstm_model/            # LSTM-based gesture recognition (experimental)

API Reference

ASL Model

from smart_gestures.alphabet import asl_model

# Data utilities
classes = asl_model.get_classes()
train_loader, val_loader = asl_model.get_loaders(train_dataset, val_dataset, batch_size=32)

# Model utilities
model = asl_model.create_model(num_classes=26, in_dim=63)
model = asl_model.load_model(path="model.pth", num_classes=26, in_dim=63)

# Training utilities
asl_model.train_model(model, train_loader, epochs=20, lr=1e-3)
accuracy = asl_model.evaluate_model(model, val_loader)

# Device
device = asl_model.DEVICE  # 'cuda' if available, else 'cpu'

VGT Model

from smart_gestures.alphabet import vgt_model

# Data utilities
classes = vgt_model.get_classes()
train_loader, val_loader = vgt_model.get_loaders(train_dataset, val_dataset, batch_size=32)
normalized = vgt_model.normalize_landmarks(landmarks, method="wrist_to_middle")

# Model utilities
model = vgt_model.create_model(num_classes=26, in_dim=63)
model = vgt_model.load_model(path="model.pth", num_classes=26, in_dim=63)

# Training utilities (with callbacks)
vgt_model.train_model(
    model, train_loader, val_loader=val_loader, 
    epochs=50, lr=1e-3,
    scheduler_type="plateau",
    early_stopping_kwargs={"patience": 10}
)
accuracy = vgt_model.evaluate_model(model, val_loader)

# Device and paths
device = vgt_model.DEVICE
model_dir = vgt_model.MODEL_DIR

Data Format

Input Format

Models expect hand landmarks in the following format:

# 21 hand landmarks with x, y, z coordinates
landmarks = [
    {"x": 0.5, "y": 0.3, "z": 0.1},
    {"x": 0.6, "y": 0.4, "z": 0.2},
    # ... 21 landmarks total
]

# Or as numpy array: shape (21, 3)
landmarks_array = np.array([[x1, y1, z1], [x2, y2, z2], ...])  # (21, 3)

# Flattened for model input: shape (1, 63)
model_input = landmarks_array.reshape(1, 63)

Dataset Files

  • ASL: CSV file at smart_gestures/alphabet/asl_model/data/hand_landmarks.csv
  • VGT: JSON file at smart_gestures/alphabet/vgt_model/data/hand_landmarks.json

Usage in Production

FastAPI Integration Example

from fastapi import APIRouter, HTTPException
import numpy as np
import torch
from smart_gestures.alphabet.asl_model import create_model, get_classes, DEVICE

classes = get_classes()
model = create_model(num_classes=len(classes), in_dim=63)
model.load_state_dict(torch.load("path/to/model.pth", map_location=DEVICE))
model.eval()

router = APIRouter()

@router.post("/predict")
async def predict(landmarks: list[dict]):
    """
    Predict sign language letter from hand landmarks.
    
    Args:
        landmarks: List of 21 hand landmarks with x, y, z coordinates
        
    Returns:
        Predicted letter
    """
    pts = np.array([[lm["x"], lm["y"], lm["z"]] for lm in landmarks], dtype=np.float32)
    
    if pts.shape != (21, 3):
        raise HTTPException(status_code=400, detail="Expected 21 landmarks")
    
    x = torch.from_numpy(pts.reshape(1, 63)).to(DEVICE)
    
    with torch.no_grad():
        logits = model(x)
        pred_idx = int(torch.argmax(logits, dim=1).item())
        predicted_letter = classes[pred_idx]
    
    return {"prediction": predicted_letter}

Flask Integration Example

from flask import Flask, request, jsonify
import numpy as np
import torch
from smart_gestures.alphabet.vgt_model import create_model, get_classes, normalize_landmarks, DEVICE

app = Flask(__name__)

classes = get_classes()
model = create_model(num_classes=len(classes), in_dim=63)
model.load_state_dict(torch.load("path/to/model.pth", map_location=DEVICE))
model.eval()

@app.route('/predict', methods=['POST'])
def predict():
    landmarks = request.json['landmarks']
    
    # Normalize landmarks
    normalized = normalize_landmarks(landmarks, method="wrist_to_middle")
    pts = np.array([[lm["x"], lm["y"], lm["z"]] for lm in normalized], dtype=np.float32)
    
    x = torch.from_numpy(pts.reshape(1, 63)).to(DEVICE)
    
    with torch.no_grad():
        logits = model(x)
        pred_idx = int(torch.argmax(logits, dim=1).item())
        
    return jsonify({'prediction': classes[pred_idx]})

Model Architecture

Both ASL and VGT models use a feedforward neural network architecture:

Input (63 features: 21 landmarks ร— 3 coordinates)
    โ†“
Linear(63 โ†’ 128) + ReLU + Dropout(0.3)
    โ†“
Linear(128 โ†’ 64) + ReLU + Dropout(0.3)
    โ†“
Linear(64 โ†’ num_classes)
    โ†“
Output (class logits)

Dataset Format

The package expects hand landmark data in the following formats:

CSV Format (ASL)

class,x0,y0,z0,x1,y1,z1,...,x20,y20,z20
A,0.5,0.3,0.1,0.6,0.4,0.2,...
B,0.4,0.2,0.0,0.5,0.3,0.1,...

JSON Format (VGT)

{
  "A": [
    [[x0,y0,z0], [x1,y1,z1], ..., [x20,y20,z20]],
    [[x0,y0,z0], [x1,y1,z1], ..., [x20,y20,z20]]
  ],
  "B": [...]
}

Performance

Model Classes Accuracy Parameters
ASL 26 ~95% ~10K
VGT 26 ~93% ~10K

Note: Actual performance depends on dataset quality and size.

Troubleshooting

CUDA Out of Memory

Reduce batch size:

train_loader, val_loader = get_loaders(train_dataset, val_dataset, batch_size=16)

Model Not Learning

Try adjusting learning rate:

train_model(model, train_loader, epochs=50, lr=1e-4)  # Lower LR

Poor Accuracy

  • Ensure landmarks are normalized correctly
  • Add data augmentation during training
  • Collect more training data
  • Verify dataset labels are correct

Citation

If you use this package in your research, please cite:

@software{smart_gestures2025,
  title = {Smart Gestures: Sign Language Alphabet Recognition},
  author = {Stijnen, Simon and Deleare, Lynn and Westerman, Olivier},
  year = {2025},
  organization = {VIVES University of Applied Sciences},
  url = {https://github.com/vives-project-xp/SmartGlasses}
}

License

GNU General Public License v3.0 or later - see the LICENSE file for details.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

Authors

  • Simon Stijnen
  • Lynn Deleare
  • Olivier Westerman

Maintained by VIVES University of Applied Sciences - Project XP

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Links

Acknowledgments

  • MediaPipe for hand tracking
  • PyTorch for the deep learning framework
  • VIVES University of Applied Sciences for supporting this project

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smart_gestures-0.2.2.tar.gz (30.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smart_gestures-0.2.2-py3-none-any.whl (26.9 kB view details)

Uploaded Python 3

File details

Details for the file smart_gestures-0.2.2.tar.gz.

File metadata

  • Download URL: smart_gestures-0.2.2.tar.gz
  • Upload date:
  • Size: 30.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.10 {"installer":{"name":"uv","version":"0.9.10"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for smart_gestures-0.2.2.tar.gz
Algorithm Hash digest
SHA256 720619aca3ffaabeb8dc59167cb94fb6a123335ae29c64c17d81329978edd230
MD5 27b4afab773098da0eaee2c90cdcf414
BLAKE2b-256 bfb059151e5e51179af16a6747c63b72080e225bd600154b0cff3de07e8eb3fb

See more details on using hashes here.

File details

Details for the file smart_gestures-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: smart_gestures-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 26.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.10 {"installer":{"name":"uv","version":"0.9.10"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for smart_gestures-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 20c422bc7769c08d105bcde64442c8479f23c5ef192591df9ef8dc2f43ee8014
MD5 685b3af3465fe7d3ed8c88df7c4bc176
BLAKE2b-256 7d28b258c98668be0345e462b1cc5a386de98064ff68cba65ae6bf14062354e1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page