Skip to main content

Sign Language Toolkit for sign language research

Project description

Sign Language Toolkit (SLTK)

A research toolkit for sign language video analysis: workspace management, pose extraction, automatic segmentation, ELAN annotation editing, and a REST API serving a React annotation workstation.

Installation

# Core (data loading, formats, ELAN I/O)
pip install -e .

# With pose extraction
pip install -e ".[mediapipe]"

# With web API + frontend
pip install -e ".[api]"

# Everything
pip install -e ".[all]"

# Development (includes pytest, black, ruff, mypy)
pip install -e ".[dev]"

Requires Python 3.10+.

Quick Start

Running the API

# Start FastAPI backend (port 8000) + Vite frontend (port 5173)
bash scripts/run_dev.sh

# Or run the backend only
uvicorn sltk.api.main:app --host 0.0.0.0 --port 8000

Interactive docs at http://localhost:8000/docs (Swagger UI).

Python Library

from sltk.data import PoseSequence, Segment, SegmentList
from sltk.io import read_eaf, write_eaf

# Load poses from H5 file
poses = PoseSequence.load("video_wilor.h5", format="wilor", fps=25)

# Load ELAN annotations
segments = read_eaf("annotations.eaf", tiers=["Gloss"])

# Create segments and export to ELAN
new_segments = SegmentList([
    Segment(start=0.0, end=1.5, label="HELLO", tier="Gloss"),
    Segment(start=1.5, end=3.0, label="WORLD", tier="Gloss"),
])
write_eaf(new_segments, "output.eaf", video_path="source.mp4")

CLI

sltk convert input.npy output.h5 --from mediapipe --to wilor --fps 25
sltk evaluate predictions.txt references.txt --task translation
sltk to-elan segments.json --video source.mp4 --output annotations.eaf
sltk from-elan annotations.eaf --output segments.json --tier Gloss

Processing Pipeline

SLTK provides a three-stage pipeline for processing sign language videos: pose extraction (video → H5), segmentation (H5 → sign boundaries), and spotting (segments → gloss labels). Each stage can be run independently.

Overview

Video (.mp4)
    │
    ├─► 1. Pose Extraction ──► {stem}_wilor.h5
    │       (WiLoR hand model: MANO params, 3D keypoints)
    │
    └─► 2. Segmentation ──► {stem}_segments.eaf / .json
    │       (Transformer BIO labelling: OUT/IN/BEGIN)
    │
    └─► 3. Spotting ──► {stem}_spotted.eaf
            (SignRep: match segments to dictionary glosses)

Stage 1: Pose Extraction (Video → H5)

Extract hand poses from video using WiLoR. This produces an H5 file containing MANO rotation matrices and 3D keypoints per frame.

Python:

from sltk.extraction.wilor import WiLoRExtractor, WiLoRConfig

config = WiLoRConfig(
    checkpoint_path="path/to/wilor_final.ckpt",
    detector_path="path/to/detector.pt",
)
extractor = WiLoRExtractor(config)
extractor.load_model()
result = extractor.extract_from_video("video.mp4")
# Saves to video_wilor.h5

API:

# Start extraction job (runs in background)
curl -X POST http://localhost:8000/api/extraction/start \
  -H "Content-Type: application/json" \
  -d '{
    "video_path": "/data/video.mp4",
    "output_root": "/data/output",
    "config": {"enable_wilor": true, "device": "cuda"}
  }'

# Poll status
curl http://localhost:8000/api/extraction/status/{job_id}

H5 file structure (WiLoR):

video_wilor.h5
  attrs: fps, num_frames, resolution, extractor
  frame_idx:      (num_frames, 2)       # (start_idx, count) per frame
  kpts_3d:        (num_detections, 21, 3)
  right:          (num_detections,)      # True = right hand
  mano/
    hand_pose:    (num_detections, 15, 3, 3)   # rotation matrices
    global_orient:(num_detections, 1, 3, 3)

Weight resolution — model checkpoints are found in this priority order:

  1. Explicit path argument
  2. Environment variable (SLTK_WILOR_CHECKPOINT, SLTK_WILOR_DETECTOR)
  3. Bundled at sltk/weights/wilor/

MediaPipe and NLF extractors are also available for body/face poses — see sltk/extraction/.

Stage 2: Segmentation (H5 → Segments)

The segmenter v2 is a Transformer that reads WiLoR H5 files and predicts per-frame BIO labels (0=OUT, 1=IN, 2=BEGIN), identifying where signs start and end.

If you already have H5 files, this is where you start.

Python:

from sltk.segmentation.runner import segment_h5
from sltk.segmentation.output import OutputFormat

# Segment a single H5 file → JSON output
segment_h5(
    "video_wilor.h5",
    output_path="video_segments.json",
    output_format=OutputFormat.JSON,
    fps=25.0,
)

# Segment a single H5 file → ELAN output
segment_h5(
    "video_wilor.h5",
    output_path="video_segments.eaf",
    output_format=OutputFormat.ELAN,
    fps=25.0,
    media_path="video.mp4",  # links video in the EAF file
)

# Segment an entire directory of H5 files
segment_h5(
    "/data/poses/",
    output_path="/data/segments/output.json",
    output_format=OutputFormat.JSON,
    fps=25.0,
)

Lower-level control:

from sltk.segmentation.runner import get_runner
from sltk.segmentation.h5_loader import h5_to_features
from sltk.segmentation.postprocess import extract_segments

# Load features from H5 (converts MANO rotations → 192-dim features)
features = h5_to_features("video_wilor.h5")  # shape: (num_frames, 192)

# Get the inference runner (singleton, loads checkpoint once)
runner = get_runner()

# Predict BIO labels
labels = runner.predict(features)  # shape: (num_frames,) with values 0/1/2

# Extract segment boundaries as (start_frame, end_frame) tuples
segments = extract_segments(labels)
# [(12, 45), (50, 82), (90, 120), ...]

API:

# Segment a single H5 file
curl -X POST http://localhost:8000/api/segmentation/segment \
  -H "Content-Type: application/json" \
  -d '{"h5_path": "/data/video_wilor.h5", "fps": 25.0}'

# Segment a directory (batch)
curl -X POST http://localhost:8000/api/segmentation/segment/batch \
  -H "Content-Type: application/json" \
  -d '{
    "directory": "/data/poses/",
    "fps": 25.0,
    "output_path": "/data/segments/",
    "output_format": "json"
  }'

JSON output format:

{
  "video_name": {
    "fps": 25.0,
    "num_frames": 3000,
    "segments": [
      {"start_frame": 12, "end_frame": 45, "start_sec": 0.48, "end_sec": 1.80},
      {"start_frame": 50, "end_frame": 82, "start_sec": 2.00, "end_sec": 3.28}
    ]
  }
}

ELAN output: creates a tier {video_name}_segmentation with each segment labelled "SIGN", authored by segmenter_v2.

Checkpoint resolution: set SLTK_SEGMENTOR_CHECKPOINT or place segmentor_v2.ckpt in sltk/weights/segmentor/.

Stage 3: Spotting (Segments → Gloss Labels)

Spotting uses SignRep to extract 768-dim visual features from video frames, then matches each detected segment against a dictionary of known sign features to produce ranked gloss predictions.

Prerequisites:

  • A segmented video (from Stage 2) with known segment boundaries
  • A dictionary of sign features — .npz files with key best_latent, one per sign, typically stored at /vol/research/SignFeaturePool/features2/{dataset}/{method}/

Python — full pipeline:

from sltk.embedding.pipeline import SignRepPipeline

pipeline = SignRepPipeline()

# Step 1: Extract dense features from the full video (sliding 16-frame windows)
continuous = pipeline.extract_continuous("video.mp4", stride=4)
# continuous.features: (L, 768) L2-normalized

# Step 2: Load dictionary features
dictionary = pipeline.load_dictionary(
    ["/data/dictionaries/bsldict/signrep/"],
    feature_key="best_latent",
)

# Step 3: Define segments (from Stage 2 output, or load from JSON/EAF)
segments = [
    {"segment_id": 0, "start_frame": 12, "end_frame": 45},
    {"segment_id": 1, "start_frame": 50, "end_frame": 82},
]

# Step 4: Spot — match each segment against dictionary
result = pipeline.spot(
    features=continuous,
    segments=segments,
    dictionary=dictionary,
    top_k=10,
    segment_pooling="max",  # "max", "mean", or "softmax_weighted"
)

# Each spotted segment has ranked gloss matches
for seg in result.segments:
    print(f"Segment {seg.start_ms}ms–{seg.end_ms}ms:")
    for gl in seg.top_glosses:
        print(f"  Rank {gl['rank']}: {gl['gloss']} ({gl['similarity']:.3f})")

Python — one-shot from video:

from sltk.embedding.pipeline import SignRepPipeline

pipeline = SignRepPipeline()
result = pipeline.spot_from_video(
    video_path="video.mp4",
    segments_json="video_segments.json",  # from Stage 2
    dictionary_dirs=["/data/dictionaries/bsldict/signrep/"],
    top_k=20,
    stride=4,
)

Save results as ELAN:

from sltk.segmentation.output import save_spotted_elan

save_spotted_elan(
    result,
    output_path="video_spotted.eaf",
    fps=25.0,
    media_path="video.mp4",
)

This creates an EAF file with tiers Rank-1 through Rank-N (gloss labels) and Score-1 through Score-N (similarity scores), authored by signrep_spotter.

API:

# Extract continuous features (cached server-side for 30 min)
curl -X POST http://localhost:8000/api/signrep/continuous/extract \
  -H "Content-Type: application/json" \
  -d '{"video_path": "/data/video.mp4", "stride": 4}'
# Returns: {"features_id": "abc123", ...}

# Spot glosses using cached features
curl -X POST http://localhost:8000/api/signrep/spot \
  -H "Content-Type: application/json" \
  -d '{
    "features_id": "abc123",
    "segments": [{"segment_id": 0, "start_frame": 12, "end_frame": 45}],
    "dictionary_dirs": ["/data/dictionaries/bsldict/signrep/"],
    "top_k": 10
  }'

Checkpoint: set SLTK_SIGNREP_CHECKPOINT or place ckpt.pt in sltk/weights/signrep/.

End-to-End: Processing API

The processing API combines segmentation and spotting into a single background job. It expects WiLoR H5 files to already exist alongside the videos.

API:

# Segmentation only
curl -X POST http://localhost:8000/api/processing/submit \
  -H "Content-Type: application/json" \
  -d '{
    "video_paths": ["/data/video1.mp4", "/data/video2.mp4"],
    "type": "segments",
    "fps": 25.0
  }'

# Segmentation + spotting
curl -X POST http://localhost:8000/api/processing/submit \
  -H "Content-Type: application/json" \
  -d '{
    "video_paths": ["/data/video1.mp4", "/data/video2.mp4"],
    "type": "spots",
    "dictionary_dirs": ["/data/dictionaries/bsldict/signrep/"],
    "top_k": 5,
    "fps": 25.0,
    "workspace": "my_workspace"
  }'

# Poll job status
curl http://localhost:8000/api/processing/status/{job_id}

# Download output EAF
curl -O http://localhost:8000/api/processing/output/{job_id}/video1_spotted.eaf

H5 file lookup: the processing API searches for {stem}_wilor.h5 next to the video, in a poses/ subdirectory, or in a {stem}/ subdirectory. If no H5 is found, the video is skipped.

Output files:

Type Output file Description
segments {stem}_segments.eaf Sign boundaries (SIGN labels)
spots {stem}_segments.eaf + {stem}_spotted.eaf Boundaries + ranked gloss labels

When a workspace is specified, output EAF files are auto-ingested into the corpus database.

Building a Dictionary

Before spotting, you need a dictionary of sign features. Extract one feature per isolated sign video:

from sltk.embedding.pipeline import SignRepPipeline

pipeline = SignRepPipeline()

# Single sign video → 768-dim feature
result = pipeline.extract_dictionary("isolated_sign.mp4", method="middle")
result.save_npz("dictionary/HELLO.npz")

Batch extraction via the API:

curl -X POST http://localhost:8000/api/signrep/dictionary/batch/job \
  -H "Content-Type: application/json" \
  -d '{
    "video_dir": "/data/isolated_signs/",
    "output_dir": "/data/dictionary/",
    "method": "middle"
  }'

Each .npz file is named after the gloss (e.g., HELLO.npz) and contains a best_latent key with the 768-dim feature vector.


Architecture

sltk/
├── api/                  # FastAPI REST API (16 routers, 88+ endpoints)
│   ├── main.py           # App init, middleware, router registration
│   ├── models.py         # Pydantic request/response schemas
│   ├── routers/          # Route handlers (see API Reference below)
│   ├── dependencies.py   # Path validation, security
│   └── security.py       # Security headers, CORS
├── io/                   # File I/O
│   ├── elan.py           # ELAN .eaf read/write
│   ├── elan_roundtrip.py # XML-preserving ELAN editing (round-trip safe)
│   ├── h5.py             # HDF5 pose data I/O
│   └── safe_write.py     # Atomic file operations
├── extraction/           # Pose extraction (MediaPipe, WiLoR, NLF)
├── segmentation/         # Transformer-based sign segmentation
├── visualization/        # Skeleton overlay video generation
├── processing/           # Feature computation, normalization
├── analysis/             # Clustering, embeddings, statistics
├── data/                 # Core types (PoseSequence, Segment, Sample)
│   └── datasets/         # Dataset loaders (BOBSL, How2Sign, BSLCP, etc.)
└── config.py             # Configuration and environment
frontend/                 # React/Vite annotation workstation
scripts/
└── run_dev.sh            # Dev server launcher

API Reference

Base URL: http://localhost:8000

Workspace Management — /api/workspace

Multi-workspace system for organizing videos and annotation files. Persists to ~/.sltk/workspaces.json.

Method Endpoint Description
GET /list List all workspaces
POST /create Create a new workspace
POST /switch Switch active workspace
POST /scan Scan directory for videos + ELAN files
GET /status Current workspace status
PUT /rename Rename workspace
DELETE /clear Clear active workspace
DELETE /{name} Delete workspace by name
PATCH /match Override video-ELAN matching
POST /rescan Rescan for new files

Videos — /api/videos /api/video

Method Endpoint Description
POST /videos/discover Discover videos in directory (recursive)
GET /videos/info Video metadata (fps, resolution, duration)
GET /video/stream Stream video with HTTP Range support

Audio — /api/audio

Method Endpoint Description
GET /waveform Extract waveform peaks. Params: path, samples (default 8000)

Extraction — /api/extraction

Pose extraction jobs (MediaPipe, WiLoR, NLF/SMPL-X).

Method Endpoint Description
GET /status/{job_id} Poll extraction progress
POST /cancel/{job_id} Cancel extraction
GET /jobs List all extraction jobs
GET /logs/{job_id} Job logs (last 1000 entries)

Poses — /api/poses

Method Endpoint Description
GET /load Load pose data from H5 file
GET /metadata Pose metadata (frames, keypoints, format)
GET /frame Single frame pose data
GET /statistics Pose statistics

Visualization — /api/visualization

Skeleton overlay video generation from H5 pose data.

Method Endpoint Description
POST /generate Generate overlay. Body: {video_path, h5_path, viz_type}
GET /status/{job_id} Poll generation progress
GET /check Check if cached overlay exists

viz_type: "mediapipe", "wilor", or "nlf"

Datasets — /api/datasets

Method Endpoint Description
POST /connect Register dataset connection
GET /connections List connected datasets
GET /list List available datasets
GET /{name}/videos Videos in dataset
DELETE /connections/{name} Remove connection

Features — /api/features

Method Endpoint Description
POST /detect Detect features in video
GET /scan Scan for feature files
GET /datasets/{name}/features/summary Feature summary for dataset

Analysis — /api/analysis

Research-oriented endpoints for vocabulary, statistics, and linguistic analysis.

Method Endpoint Description
GET /vocabulary Extract vocabulary from dataset
POST /batch/statistics Batch statistical analysis
POST /research/vocabulary-mapping Map glosses across datasets
POST /research/compare-datasets Compare two datasets
POST /research/find-gloss-examples Find gloss examples
POST /linguistic/concordance Gloss concordance
POST /linguistic/cooccurrence Co-occurrence analysis
POST /linguistic/ngrams N-gram frequency
POST /linguistic/duration-analysis Duration statistics

Embeddings — /api/embeddings

Method Endpoint Description
GET /status/{dataset} Embedding generation status
DELETE /cache/{dataset} Clear embeddings
GET /signrep/status SignRep model status

Linguistics — /api/linguistics

Inter-rater reliability and phonological analysis.

Method Endpoint Description
POST /reliability/kappa Cohen's kappa
POST /reliability/krippendorff Krippendorff's alpha
POST /reliability/boundary-agreement Boundary agreement
POST /reliability/confusion-matrix Confusion matrix
POST /phonological-form Extract phonological form
POST /phonological-distance Phonological distance

Jobs — /api/jobs

Method Endpoint Description
GET /status Job system status
GET /gpu GPU status and memory
GET /list List active jobs
POST /{job_id}/cancel Cancel job

Settings — /api/settings

Method Endpoint Description
GET / Get app settings
POST / Update settings
GET /info System info
GET /system/weights Model weights info

Middleware & Security

The API applies the following middleware (in order):

  1. GZip — compresses responses >500 bytes
  2. Security headersX-Frame-Options: DENY, X-Content-Type-Options: nosniff, CSP, Permissions-Policy
  3. CORS — configurable via SLTK_CORS_ORIGINS env var (default: localhost:5173,localhost:3000)
  4. Path validation — whitelist check against SLTK_ALLOWED_PATHS, directory traversal prevention

Configuration

Environment variables (set in .env or shell):

Variable Description Default
SLTK_CORS_ORIGINS Allowed CORS origins (comma-separated) http://localhost:5173,http://localhost:3000
SLTK_ALLOWED_PATHS Allowed filesystem paths for API access /vol/research,/home
SLTK_RESEARCH_DATA_ROOT Root for research data /vol/research
SLTK_DATASETS_ROOT Root for raw datasets /vol/research/datasets
SLTK_FEATURE_ROOT Root for extracted features /vol/research/SignFeaturePool/features2
SLTK_NLF_MODEL_PATH Path to NLF model weights
SLTK_WILOR_MODEL_PATH Path to WiLoR model weights
SLTK_SIGNREP_CHECKPOINT Path to SignRep checkpoint
SLTK_SEGMENTOR_PATH Path to segmentor checkpoint

Supported Pose Formats

Format Joints Description
MediaPipe 33 body + 21x2 hands + 468 face Holistic pose estimation
WiLoR 21 per hand MANO hand model with rotation matrices
NLF/SMPL-X 55 joints Full body with axis-angle rotations

All stored as HDF5 (.h5) files.

Testing

# Run full suite (1429 tests)
pytest

# With coverage
pytest --cov=sltk --cov-report=html

# Specific markers
pytest -m api          # API tests only
pytest -m "not slow"   # Skip slow tests
pytest -m gpu          # GPU tests only

CI

GitHub Actions runs on every push/PR to main:

  • Lint: black + ruff
  • Type check: mypy
  • Tests: pytest across Python 3.10, 3.11, 3.12 (coverage threshold: 40%)
  • Frontend: npm build

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

signlangtk-0.1.0.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

signlangtk-0.1.0-py3-none-any.whl (1.2 MB view details)

Uploaded Python 3

File details

Details for the file signlangtk-0.1.0.tar.gz.

File metadata

  • Download URL: signlangtk-0.1.0.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for signlangtk-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a69212513350fa46692b5e39f706976def696ff1a7151486b11411f3b76ad98e
MD5 7d3e6a5248b3dcf610ef4bb1cc016b08
BLAKE2b-256 d000ff49195f5bcb9f5fe9abc9dc0cb286d3211961e330e050b7d26921160288

See more details on using hashes here.

Provenance

The following attestation bundles were made for signlangtk-0.1.0.tar.gz:

Publisher: publish.yml on ed-fish/Sign-Language-Toolkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file signlangtk-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: signlangtk-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for signlangtk-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 782c122d9bf8c35eb3397e0fe2a2eadff9cd787b09519fddaea68cad7ae6e1c1
MD5 2b799f01253e29f3ba9a2d529920ed3d
BLAKE2b-256 8d165aa855ec6047c48c463855ef09458de6d25f49e0e3acab2d5fa81bc6dd5f

See more details on using hashes here.

Provenance

The following attestation bundles were made for signlangtk-0.1.0-py3-none-any.whl:

Publisher: publish.yml on ed-fish/Sign-Language-Toolkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page