Utilities and models for ASL alphabet training used in the SmartGlasses project

These details have not been verified by PyPI

Project links

Project description

Smart Gestures

A Python package for sign language alphabet recognition using PyTorch and MediaPipe hand tracking.

Overview

Smart Gestures provides PyTorch-based models and utilities for training and deploying sign language alphabet recognition systems. The package includes:

ASL (American Sign Language) alphabet recognition
VGT (Vlaamse Gebarentaal / Flemish Sign Language) alphabet recognition
LSTM-based gesture recognition (experimental)

Features

🤖 Pre-trained models for ASL and VGT alphabets
📊 Data loading utilities with built-in augmentation support
🎯 Training utilities with callbacks (early stopping, model checkpoints, learning rate scheduling)
🔧 Model utilities for creating, loading, and evaluating models
📈 Real-time inference support with MediaPipe hand landmarks
🎨 Data augmentation (rotation, noise, scaling)

Installation

From PyPI (Recommended)

pip install smart-gestures

From Source

pip install git+https://github.com/vives-project-xp/SmartGlasses.git#subdirectory=notebooks/package

Development Installation

git clone https://github.com/vives-project-xp/SmartGlasses.git
cd SmartGlasses/notebooks/package
pip install -e .

Requirements

Python 3.12+
PyTorch 2.9.0+
MediaPipe 0.10.14+
NumPy 2.3.4+
Pandas 2.3.3+
tqdm 4.67.1+

Quick Start

Import the Package

from smart_gestures.alphabet import asl_model, vgt_model

Get Available Classes

# ASL alphabet classes
asl_classes = asl_model.get_classes()
print(f"ASL classes: {asl_classes}")

# VGT alphabet classes
vgt_classes = vgt_model.get_classes()
print(f"VGT classes: {vgt_classes}")

Load a Pre-trained Model

import torch
from smart_gestures.alphabet.asl_model import create_model, load_model, get_classes, DEVICE

# Get classes
classes = get_classes()
num_classes = len(classes)

# Create model architecture
model = create_model(num_classes=num_classes, in_dim=63)

# Load weights
model_path = "path/to/hand_gesture_model.pth"
model.load_state_dict(torch.load(model_path, map_location=DEVICE))
model.eval()

Make Predictions

import numpy as np
import torch

# Prepare input: 21 landmarks with x, y, z coordinates
landmarks = np.random.rand(21, 3).astype(np.float32)  # Replace with actual landmarks
input_tensor = torch.from_numpy(landmarks.reshape(1, 63)).to(DEVICE)

# Predict
with torch.no_grad():
    logits = model(input_tensor)
    pred_idx = int(torch.argmax(logits, dim=1).item())
    predicted_class = classes[pred_idx]
    
print(f"Prediction: {predicted_class}")

Normalize Landmarks (VGT Model)

from smart_gestures.alphabet.vgt_model import normalize_landmarks

# Raw landmarks from MediaPipe (list of dicts with x, y, z)
raw_landmarks = [{"x": 0.5, "y": 0.3, "z": 0.1}, ...]  # 21 landmarks

# Normalize (wrist-to-middle finger scaling)
normalized = normalize_landmarks(raw_landmarks, method="wrist_to_middle")

Training a Model

ASL Model Training

from smart_gestures.alphabet.asl_model import (
    get_classes, 
    get_loaders,
    create_model,
    train_model,
    evaluate_model
)
from smart_gestures.alphabet.asl_model.data_utils import (
    load_and_preprocess_dataset,
    split_dataset,
    HAND_LANDMARKS_CSV
)
from smart_gestures.alphabet.asl_model.model_utils import save_model

# Load data
classes = get_classes()
dataset = load_and_preprocess_dataset(HAND_LANDMARKS_CSV)
train_dataset, val_dataset = split_dataset(dataset, val_ratio=0.2, random_seed=42)
train_loader, val_loader = get_loaders(train_dataset, val_dataset, batch_size=32)

# Create model
in_dim = 63  # 21 landmarks * 3 coordinates
num_classes = len(classes)
model = create_model(num_classes, in_dim)

# Train
train_model(model, train_loader, epochs=20, lr=1e-3)

# Evaluate
accuracy = evaluate_model(model, val_loader)
print(f"Validation Accuracy: {accuracy:.2f}%")

# Save
save_model(model, path="my_model.pth")

VGT Model Training (Advanced)

from smart_gestures.alphabet.vgt_model import (
    create_model,
    train_model,
    evaluate_model
)
from smart_gestures.alphabet.vgt_model.data_utils import (
    load_dataset_normalized,
    split_dataset,
    get_loaders,
    get_classes,
    HAND_LANDMARKS_JSON
)
from smart_gestures.alphabet.vgt_model.model_utils import save_model

# Load normalized dataset with augmentation
dataset = load_dataset_normalized(
    HAND_LANDMARKS_JSON,
    as_sequence=False,
    scale_method="wrist_to_middle",
    augment=True,
    augment_prob=0.5,
    noise_std=0.02,
    rotate_deg=15
)

train_dataset, val_dataset = split_dataset(dataset, val_ratio=0.2, random_seed=42)
train_loader, val_loader = get_loaders(train_dataset, val_dataset, batch_size=32)

# Create model
classes = get_classes()
model = create_model(num_classes=len(classes), in_dim=63)

# Train with callbacks
train_model(
    model,
    train_loader,
    val_loader=val_loader,
    epochs=50,
    lr=1e-3,
    scheduler_type="plateau",
    scheduler_kwargs={"factor": 0.5, "patience": 5},
    early_stopping_kwargs={"patience": 10, "min_delta": 0.001},
    checkpoint_kwargs={"filepath": "checkpoints/best_model.pth"}
)

# Evaluate
accuracy = evaluate_model(model, val_loader)
print(f"Validation Accuracy: {accuracy:.2f}%")

Command-Line Training

Both ASL and VGT models include command-line training scripts:

ASL Training

python -m smart_gestures.alphabet.asl_model.run_training \
    --batch_size 32 \
    --epochs 20 \
    --lr 0.001 \
    --output models/hand_gesture_model.pth

VGT Training

python -m smart_gestures.alphabet.vgt_model.run_training \
    --batch_size 32 \
    --epochs 50 \
    --lr 0.001 \
    --augment \
    --augment_prob 0.5 \
    --scheduler plateau \
    --early_stopping \
    --output models/hand_gesture_model.pth

Package Structure

smart_gestures/
├── __init__.py
├── alphabet/
│   ├── __init__.py
│   ├── const.py
│   ├── asl_model/
│   │   ├── __init__.py
│   │   ├── data_utils.py      # Data loading and preprocessing
│   │   ├── model_utils.py     # Model architecture and utilities
│   │   ├── train_utils.py     # Training and evaluation functions
│   │   ├── run_training.py    # CLI training script
│   │   ├── data/              # Dataset files
│   │   └── models/            # Saved model checkpoints
│   └── vgt_model/
│       ├── __init__.py
│       ├── data_utils.py      # Data loading with normalization
│       ├── model_utils.py     # Model architecture
│       ├── train_utils.py     # Training with callbacks
│       ├── callbacks.py       # Training callbacks
│       ├── run_training.py    # CLI training script
│       ├── test_camera.py     # Real-time testing utility
│       ├── data/              # Dataset files
│       ├── dataset/           # Raw dataset
│       └── models/            # Saved model checkpoints
└── gestures/
    └── lstm_model/            # LSTM-based gesture recognition (experimental)

API Reference

ASL Model

from smart_gestures.alphabet import asl_model

# Data utilities
classes = asl_model.get_classes()
train_loader, val_loader = asl_model.get_loaders(train_dataset, val_dataset, batch_size=32)

# Model utilities
model = asl_model.create_model(num_classes=26, in_dim=63)
model = asl_model.load_model(path="model.pth", num_classes=26, in_dim=63)

# Training utilities
asl_model.train_model(model, train_loader, epochs=20, lr=1e-3)
accuracy = asl_model.evaluate_model(model, val_loader)

# Device
device = asl_model.DEVICE  # 'cuda' if available, else 'cpu'

VGT Model

from smart_gestures.alphabet import vgt_model

# Data utilities
classes = vgt_model.get_classes()
train_loader, val_loader = vgt_model.get_loaders(train_dataset, val_dataset, batch_size=32)
normalized = vgt_model.normalize_landmarks(landmarks, method="wrist_to_middle")

# Model utilities
model = vgt_model.create_model(num_classes=26, in_dim=63)
model = vgt_model.load_model(path="model.pth", num_classes=26, in_dim=63)

# Training utilities (with callbacks)
vgt_model.train_model(
    model, train_loader, val_loader=val_loader, 
    epochs=50, lr=1e-3,
    scheduler_type="plateau",
    early_stopping_kwargs={"patience": 10}
)
accuracy = vgt_model.evaluate_model(model, val_loader)

# Device and paths
device = vgt_model.DEVICE
model_dir = vgt_model.MODEL_DIR

Data Format

Input Format

Models expect hand landmarks in the following format:

# 21 hand landmarks with x, y, z coordinates
landmarks = [
    {"x": 0.5, "y": 0.3, "z": 0.1},
    {"x": 0.6, "y": 0.4, "z": 0.2},
    # ... 21 landmarks total
]

# Or as numpy array: shape (21, 3)
landmarks_array = np.array([[x1, y1, z1], [x2, y2, z2], ...])  # (21, 3)

# Flattened for model input: shape (1, 63)
model_input = landmarks_array.reshape(1, 63)

Dataset Files

ASL: CSV file at smart_gestures/alphabet/asl_model/data/hand_landmarks.csv
VGT: JSON file at smart_gestures/alphabet/vgt_model/data/hand_landmarks.json

Usage in Production

FastAPI Integration Example

from fastapi import APIRouter, HTTPException
import numpy as np
import torch
from smart_gestures.alphabet.asl_model import create_model, get_classes, DEVICE

classes = get_classes()
model = create_model(num_classes=len(classes), in_dim=63)
model.load_state_dict(torch.load("path/to/model.pth", map_location=DEVICE))
model.eval()

router = APIRouter()

@router.post("/predict")
async def predict(landmarks: list[dict]):
    """
    Predict sign language letter from hand landmarks.
    
    Args:
        landmarks: List of 21 hand landmarks with x, y, z coordinates
        
    Returns:
        Predicted letter
    """
    pts = np.array([[lm["x"], lm["y"], lm["z"]] for lm in landmarks], dtype=np.float32)
    
    if pts.shape != (21, 3):
        raise HTTPException(status_code=400, detail="Expected 21 landmarks")
    
    x = torch.from_numpy(pts.reshape(1, 63)).to(DEVICE)
    
    with torch.no_grad():
        logits = model(x)
        pred_idx = int(torch.argmax(logits, dim=1).item())
        predicted_letter = classes[pred_idx]
    
    return {"prediction": predicted_letter}

Flask Integration Example

from flask import Flask, request, jsonify
import numpy as np
import torch
from smart_gestures.alphabet.vgt_model import create_model, get_classes, normalize_landmarks, DEVICE

app = Flask(__name__)

classes = get_classes()
model = create_model(num_classes=len(classes), in_dim=63)
model.load_state_dict(torch.load("path/to/model.pth", map_location=DEVICE))
model.eval()

@app.route('/predict', methods=['POST'])
def predict():
    landmarks = request.json['landmarks']
    
    # Normalize landmarks
    normalized = normalize_landmarks(landmarks, method="wrist_to_middle")
    pts = np.array([[lm["x"], lm["y"], lm["z"]] for lm in normalized], dtype=np.float32)
    
    x = torch.from_numpy(pts.reshape(1, 63)).to(DEVICE)
    
    with torch.no_grad():
        logits = model(x)
        pred_idx = int(torch.argmax(logits, dim=1).item())
        
    return jsonify({'prediction': classes[pred_idx]})

Model Architecture

Both ASL and VGT models use a feedforward neural network architecture:

Input (63 features: 21 landmarks × 3 coordinates)
    ↓
Linear(63 → 128) + ReLU + Dropout(0.3)
    ↓
Linear(128 → 64) + ReLU + Dropout(0.3)
    ↓
Linear(64 → num_classes)
    ↓
Output (class logits)

Dataset Format

The package expects hand landmark data in the following formats:

CSV Format (ASL)

class,x0,y0,z0,x1,y1,z1,...,x20,y20,z20
A,0.5,0.3,0.1,0.6,0.4,0.2,...
B,0.4,0.2,0.0,0.5,0.3,0.1,...

JSON Format (VGT)

{
  "A": [
    [[x0,y0,z0], [x1,y1,z1], ..., [x20,y20,z20]],
    [[x0,y0,z0], [x1,y1,z1], ..., [x20,y20,z20]]
  ],
  "B": [...]
}

Performance

Model	Classes	Accuracy	Parameters
ASL	26	~95%	~10K
VGT	26	~93%	~10K

Note: Actual performance depends on dataset quality and size.

Troubleshooting

CUDA Out of Memory

Reduce batch size:

train_loader, val_loader = get_loaders(train_dataset, val_dataset, batch_size=16)

Model Not Learning

Try adjusting learning rate:

train_model(model, train_loader, epochs=50, lr=1e-4)  # Lower LR

Poor Accuracy

Ensure landmarks are normalized correctly
Add data augmentation during training
Collect more training data
Verify dataset labels are correct

Citation

If you use this package in your research, please cite:

@software{smart_gestures2025,
  title = {Smart Gestures: Sign Language Alphabet Recognition},
  author = {Stijnen, Simon and Deleare, Lynn and Westerman, Olivier},
  year = {2025},
  organization = {VIVES University of Applied Sciences},
  url = {https://github.com/vives-project-xp/SmartGlasses}
}

License

GNU General Public License v3.0 or later - see the LICENSE file for details.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

Authors

Simon Stijnen
Lynn Deleare
Olivier Westerman

Maintained by VIVES University of Applied Sciences - Project XP

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Acknowledgments

MediaPipe for hand tracking
PyTorch for the deep learning framework
VIVES University of Applied Sciences for supporting this project

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.3

Nov 29, 2025

0.3.2

Nov 27, 2025

0.3.1

Nov 27, 2025

0.3.0

Nov 24, 2025

0.2.3

Nov 20, 2025

This version

0.2.2 yanked

Nov 17, 2025

Reason this release was yanked:

Non python files aren't included in the package. Therefore the package isn't working as expected.

0.2.0 yanked

Nov 14, 2025

Reason this release was yanked:

Non python files aren't included in the package. Therefore the package isn't working as expected.

0.1.1 yanked

Nov 12, 2025

Reason this release was yanked:

Non python files aren't included in the package. Therefore the package isn't working as expected.

0.1.0 yanked

Nov 12, 2025

Reason this release was yanked:

Non python files aren't included in the package. Therefore the package isn't working as expected.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smart_gestures-0.2.2.tar.gz (30.1 kB view details)

Uploaded Nov 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

smart_gestures-0.2.2-py3-none-any.whl (26.9 kB view details)

Uploaded Nov 17, 2025 Python 3

File details

Details for the file smart_gestures-0.2.2.tar.gz.

File metadata

Download URL: smart_gestures-0.2.2.tar.gz
Upload date: Nov 17, 2025
Size: 30.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.10 {"installer":{"name":"uv","version":"0.9.10"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for smart_gestures-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`720619aca3ffaabeb8dc59167cb94fb6a123335ae29c64c17d81329978edd230`
MD5	`27b4afab773098da0eaee2c90cdcf414`
BLAKE2b-256	`bfb059151e5e51179af16a6747c63b72080e225bd600154b0cff3de07e8eb3fb`

See more details on using hashes here.

File details

Details for the file smart_gestures-0.2.2-py3-none-any.whl.

File metadata

Download URL: smart_gestures-0.2.2-py3-none-any.whl
Upload date: Nov 17, 2025
Size: 26.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.10 {"installer":{"name":"uv","version":"0.9.10"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for smart_gestures-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`20c422bc7769c08d105bcde64442c8479f23c5ef192591df9ef8dc2f43ee8014`
MD5	`685b3af3465fe7d3ed8c88df7c4bc176`
BLAKE2b-256	`7d28b258c98668be0345e462b1cc5a386de98064ff68cba65ae6bf14062354e1`

See more details on using hashes here.

smart_gestures 0.2.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Smart Gestures

Overview

Features

Installation

From PyPI (Recommended)

From Source

Development Installation

Requirements

Quick Start

Import the Package

Get Available Classes

Load a Pre-trained Model

Make Predictions

Normalize Landmarks (VGT Model)

Training a Model

ASL Model Training

VGT Model Training (Advanced)

Command-Line Training

ASL Training

VGT Training

Package Structure

API Reference

ASL Model

VGT Model

Data Format

Input Format

Dataset Files

Usage in Production

FastAPI Integration Example

Flask Integration Example

Model Architecture

Dataset Format

CSV Format (ASL)

JSON Format (VGT)

Performance

Troubleshooting

CUDA Out of Memory

Model Not Learning

Poor Accuracy

Citation

License

Authors

Contributing

Links

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes