Utilities and models for ASL alphabet training used in the SmartGlasses project
Reason this release was yanked:
Non python files aren't included in the package. Therefore the package isn't working as expected.
Project description
Smart Gestures
A Python package for sign language alphabet recognition using PyTorch and MediaPipe hand tracking.
Overview
Smart Gestures provides PyTorch-based models and utilities for training and deploying sign language alphabet recognition systems. The package includes:
- ASL (American Sign Language) alphabet recognition
- VGT (Vlaamse Gebarentaal / Flemish Sign Language) alphabet recognition
- LSTM-based gesture recognition (experimental)
Features
- ๐ค Pre-trained models for ASL and VGT alphabets
- ๐ Data loading utilities with built-in augmentation support
- ๐ฏ Training utilities with callbacks (early stopping, model checkpoints, learning rate scheduling)
- ๐ง Model utilities for creating, loading, and evaluating models
- ๐ Real-time inference support with MediaPipe hand landmarks
- ๐จ Data augmentation (rotation, noise, scaling)
Installation
From PyPI (Recommended)
pip install smart-gestures
From Source
pip install git+https://github.com/vives-project-xp/SmartGlasses.git#subdirectory=notebooks/package
Development Installation
git clone https://github.com/vives-project-xp/SmartGlasses.git
cd SmartGlasses/notebooks/package
pip install -e .
Requirements
- Python 3.12+
- PyTorch 2.9.0+
- MediaPipe 0.10.14+
- NumPy 2.3.4+
- Pandas 2.3.3+
- tqdm 4.67.1+
Quick Start
Import the Package
from smart_gestures.alphabet import asl_model, vgt_model
Get Available Classes
# ASL alphabet classes
asl_classes = asl_model.get_classes()
print(f"ASL classes: {asl_classes}")
# VGT alphabet classes
vgt_classes = vgt_model.get_classes()
print(f"VGT classes: {vgt_classes}")
Load a Pre-trained Model
import torch
from smart_gestures.alphabet.asl_model import create_model, load_model, get_classes, DEVICE
# Get classes
classes = get_classes()
num_classes = len(classes)
# Create model architecture
model = create_model(num_classes=num_classes, in_dim=63)
# Load weights
model_path = "path/to/hand_gesture_model.pth"
model.load_state_dict(torch.load(model_path, map_location=DEVICE))
model.eval()
Make Predictions
import numpy as np
import torch
# Prepare input: 21 landmarks with x, y, z coordinates
landmarks = np.random.rand(21, 3).astype(np.float32) # Replace with actual landmarks
input_tensor = torch.from_numpy(landmarks.reshape(1, 63)).to(DEVICE)
# Predict
with torch.no_grad():
logits = model(input_tensor)
pred_idx = int(torch.argmax(logits, dim=1).item())
predicted_class = classes[pred_idx]
print(f"Prediction: {predicted_class}")
Normalize Landmarks (VGT Model)
from smart_gestures.alphabet.vgt_model import normalize_landmarks
# Raw landmarks from MediaPipe (list of dicts with x, y, z)
raw_landmarks = [{"x": 0.5, "y": 0.3, "z": 0.1}, ...] # 21 landmarks
# Normalize (wrist-to-middle finger scaling)
normalized = normalize_landmarks(raw_landmarks, method="wrist_to_middle")
Training a Model
ASL Model Training
from smart_gestures.alphabet.asl_model import (
get_classes,
get_loaders,
create_model,
train_model,
evaluate_model
)
from smart_gestures.alphabet.asl_model.data_utils import (
load_and_preprocess_dataset,
split_dataset,
HAND_LANDMARKS_CSV
)
from smart_gestures.alphabet.asl_model.model_utils import save_model
# Load data
classes = get_classes()
dataset = load_and_preprocess_dataset(HAND_LANDMARKS_CSV)
train_dataset, val_dataset = split_dataset(dataset, val_ratio=0.2, random_seed=42)
train_loader, val_loader = get_loaders(train_dataset, val_dataset, batch_size=32)
# Create model
in_dim = 63 # 21 landmarks * 3 coordinates
num_classes = len(classes)
model = create_model(num_classes, in_dim)
# Train
train_model(model, train_loader, epochs=20, lr=1e-3)
# Evaluate
accuracy = evaluate_model(model, val_loader)
print(f"Validation Accuracy: {accuracy:.2f}%")
# Save
save_model(model, path="my_model.pth")
VGT Model Training (Advanced)
from smart_gestures.alphabet.vgt_model import (
create_model,
train_model,
evaluate_model
)
from smart_gestures.alphabet.vgt_model.data_utils import (
load_dataset_normalized,
split_dataset,
get_loaders,
get_classes,
HAND_LANDMARKS_JSON
)
from smart_gestures.alphabet.vgt_model.model_utils import save_model
# Load normalized dataset with augmentation
dataset = load_dataset_normalized(
HAND_LANDMARKS_JSON,
as_sequence=False,
scale_method="wrist_to_middle",
augment=True,
augment_prob=0.5,
noise_std=0.02,
rotate_deg=15
)
train_dataset, val_dataset = split_dataset(dataset, val_ratio=0.2, random_seed=42)
train_loader, val_loader = get_loaders(train_dataset, val_dataset, batch_size=32)
# Create model
classes = get_classes()
model = create_model(num_classes=len(classes), in_dim=63)
# Train with callbacks
train_model(
model,
train_loader,
val_loader=val_loader,
epochs=50,
lr=1e-3,
scheduler_type="plateau",
scheduler_kwargs={"factor": 0.5, "patience": 5},
early_stopping_kwargs={"patience": 10, "min_delta": 0.001},
checkpoint_kwargs={"filepath": "checkpoints/best_model.pth"}
)
# Evaluate
accuracy = evaluate_model(model, val_loader)
print(f"Validation Accuracy: {accuracy:.2f}%")
Command-Line Training
Both ASL and VGT models include command-line training scripts:
ASL Training
python -m smart_gestures.alphabet.asl_model.run_training \
--batch_size 32 \
--epochs 20 \
--lr 0.001 \
--output models/hand_gesture_model.pth
VGT Training
python -m smart_gestures.alphabet.vgt_model.run_training \
--batch_size 32 \
--epochs 50 \
--lr 0.001 \
--augment \
--augment_prob 0.5 \
--scheduler plateau \
--early_stopping \
--output models/hand_gesture_model.pth
Package Structure
smart_gestures/
โโโ __init__.py
โโโ alphabet/
โ โโโ __init__.py
โ โโโ const.py
โ โโโ asl_model/
โ โ โโโ __init__.py
โ โ โโโ data_utils.py # Data loading and preprocessing
โ โ โโโ model_utils.py # Model architecture and utilities
โ โ โโโ train_utils.py # Training and evaluation functions
โ โ โโโ run_training.py # CLI training script
โ โ โโโ data/ # Dataset files
โ โ โโโ models/ # Saved model checkpoints
โ โโโ vgt_model/
โ โโโ __init__.py
โ โโโ data_utils.py # Data loading with normalization
โ โโโ model_utils.py # Model architecture
โ โโโ train_utils.py # Training with callbacks
โ โโโ callbacks.py # Training callbacks
โ โโโ run_training.py # CLI training script
โ โโโ test_camera.py # Real-time testing utility
โ โโโ data/ # Dataset files
โ โโโ dataset/ # Raw dataset
โ โโโ models/ # Saved model checkpoints
โโโ gestures/
โโโ lstm_model/ # LSTM-based gesture recognition (experimental)
API Reference
ASL Model
from smart_gestures.alphabet import asl_model
# Data utilities
classes = asl_model.get_classes()
train_loader, val_loader = asl_model.get_loaders(train_dataset, val_dataset, batch_size=32)
# Model utilities
model = asl_model.create_model(num_classes=26, in_dim=63)
model = asl_model.load_model(path="model.pth", num_classes=26, in_dim=63)
# Training utilities
asl_model.train_model(model, train_loader, epochs=20, lr=1e-3)
accuracy = asl_model.evaluate_model(model, val_loader)
# Device
device = asl_model.DEVICE # 'cuda' if available, else 'cpu'
VGT Model
from smart_gestures.alphabet import vgt_model
# Data utilities
classes = vgt_model.get_classes()
train_loader, val_loader = vgt_model.get_loaders(train_dataset, val_dataset, batch_size=32)
normalized = vgt_model.normalize_landmarks(landmarks, method="wrist_to_middle")
# Model utilities
model = vgt_model.create_model(num_classes=26, in_dim=63)
model = vgt_model.load_model(path="model.pth", num_classes=26, in_dim=63)
# Training utilities (with callbacks)
vgt_model.train_model(
model, train_loader, val_loader=val_loader,
epochs=50, lr=1e-3,
scheduler_type="plateau",
early_stopping_kwargs={"patience": 10}
)
accuracy = vgt_model.evaluate_model(model, val_loader)
# Device and paths
device = vgt_model.DEVICE
model_dir = vgt_model.MODEL_DIR
Data Format
Input Format
Models expect hand landmarks in the following format:
# 21 hand landmarks with x, y, z coordinates
landmarks = [
{"x": 0.5, "y": 0.3, "z": 0.1},
{"x": 0.6, "y": 0.4, "z": 0.2},
# ... 21 landmarks total
]
# Or as numpy array: shape (21, 3)
landmarks_array = np.array([[x1, y1, z1], [x2, y2, z2], ...]) # (21, 3)
# Flattened for model input: shape (1, 63)
model_input = landmarks_array.reshape(1, 63)
Dataset Files
- ASL: CSV file at
smart_gestures/alphabet/asl_model/data/hand_landmarks.csv - VGT: JSON file at
smart_gestures/alphabet/vgt_model/data/hand_landmarks.json
Usage in Production
FastAPI Integration Example
from fastapi import APIRouter, HTTPException
import numpy as np
import torch
from smart_gestures.alphabet.asl_model import create_model, get_classes, DEVICE
classes = get_classes()
model = create_model(num_classes=len(classes), in_dim=63)
model.load_state_dict(torch.load("path/to/model.pth", map_location=DEVICE))
model.eval()
router = APIRouter()
@router.post("/predict")
async def predict(landmarks: list[dict]):
"""
Predict sign language letter from hand landmarks.
Args:
landmarks: List of 21 hand landmarks with x, y, z coordinates
Returns:
Predicted letter
"""
pts = np.array([[lm["x"], lm["y"], lm["z"]] for lm in landmarks], dtype=np.float32)
if pts.shape != (21, 3):
raise HTTPException(status_code=400, detail="Expected 21 landmarks")
x = torch.from_numpy(pts.reshape(1, 63)).to(DEVICE)
with torch.no_grad():
logits = model(x)
pred_idx = int(torch.argmax(logits, dim=1).item())
predicted_letter = classes[pred_idx]
return {"prediction": predicted_letter}
Flask Integration Example
from flask import Flask, request, jsonify
import numpy as np
import torch
from smart_gestures.alphabet.vgt_model import create_model, get_classes, normalize_landmarks, DEVICE
app = Flask(__name__)
classes = get_classes()
model = create_model(num_classes=len(classes), in_dim=63)
model.load_state_dict(torch.load("path/to/model.pth", map_location=DEVICE))
model.eval()
@app.route('/predict', methods=['POST'])
def predict():
landmarks = request.json['landmarks']
# Normalize landmarks
normalized = normalize_landmarks(landmarks, method="wrist_to_middle")
pts = np.array([[lm["x"], lm["y"], lm["z"]] for lm in normalized], dtype=np.float32)
x = torch.from_numpy(pts.reshape(1, 63)).to(DEVICE)
with torch.no_grad():
logits = model(x)
pred_idx = int(torch.argmax(logits, dim=1).item())
return jsonify({'prediction': classes[pred_idx]})
Model Architecture
Both ASL and VGT models use a feedforward neural network architecture:
Input (63 features: 21 landmarks ร 3 coordinates)
โ
Linear(63 โ 128) + ReLU + Dropout(0.3)
โ
Linear(128 โ 64) + ReLU + Dropout(0.3)
โ
Linear(64 โ num_classes)
โ
Output (class logits)
Dataset Format
The package expects hand landmark data in the following formats:
CSV Format (ASL)
class,x0,y0,z0,x1,y1,z1,...,x20,y20,z20
A,0.5,0.3,0.1,0.6,0.4,0.2,...
B,0.4,0.2,0.0,0.5,0.3,0.1,...
JSON Format (VGT)
{
"A": [
[[x0,y0,z0], [x1,y1,z1], ..., [x20,y20,z20]],
[[x0,y0,z0], [x1,y1,z1], ..., [x20,y20,z20]]
],
"B": [...]
}
Performance
| Model | Classes | Accuracy | Parameters |
|---|---|---|---|
| ASL | 26 | ~95% | ~10K |
| VGT | 26 | ~93% | ~10K |
Note: Actual performance depends on dataset quality and size.
Troubleshooting
CUDA Out of Memory
Reduce batch size:
train_loader, val_loader = get_loaders(train_dataset, val_dataset, batch_size=16)
Model Not Learning
Try adjusting learning rate:
train_model(model, train_loader, epochs=50, lr=1e-4) # Lower LR
Poor Accuracy
- Ensure landmarks are normalized correctly
- Add data augmentation during training
- Collect more training data
- Verify dataset labels are correct
Citation
If you use this package in your research, please cite:
@software{smart_gestures2025,
title = {Smart Gestures: Sign Language Alphabet Recognition},
author = {Stijnen, Simon and Deleare, Lynn and Westerman, Olivier},
year = {2025},
organization = {VIVES University of Applied Sciences},
url = {https://github.com/vives-project-xp/SmartGlasses}
}
License
GNU General Public License v3.0 or later - see the LICENSE file for details.
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
Authors
- Simon Stijnen
- Lynn Deleare
- Olivier Westerman
Maintained by VIVES University of Applied Sciences - Project XP
Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Links
- Documentation: GitHub Repository
- Issue Tracker: GitHub Issues
- Source Code: GitHub
Acknowledgments
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file smart_gestures-0.2.2.tar.gz.
File metadata
- Download URL: smart_gestures-0.2.2.tar.gz
- Upload date:
- Size: 30.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.10 {"installer":{"name":"uv","version":"0.9.10"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
720619aca3ffaabeb8dc59167cb94fb6a123335ae29c64c17d81329978edd230
|
|
| MD5 |
27b4afab773098da0eaee2c90cdcf414
|
|
| BLAKE2b-256 |
bfb059151e5e51179af16a6747c63b72080e225bd600154b0cff3de07e8eb3fb
|
File details
Details for the file smart_gestures-0.2.2-py3-none-any.whl.
File metadata
- Download URL: smart_gestures-0.2.2-py3-none-any.whl
- Upload date:
- Size: 26.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.10 {"installer":{"name":"uv","version":"0.9.10"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
20c422bc7769c08d105bcde64442c8479f23c5ef192591df9ef8dc2f43ee8014
|
|
| MD5 |
685b3af3465fe7d3ed8c88df7c4bc176
|
|
| BLAKE2b-256 |
7d28b258c98668be0345e462b1cc5a386de98064ff68cba65ae6bf14062354e1
|