TFDS Datasets for sign language

Project description

Sign Language Datasets

This repository includes TFDS data loaders for sign language datasets.

Installation

From Source

pip install git+https://github.com/sign-language-processing/datasets.git

PyPi

pip install sign-language-datasets

For apple environments, you may need touse an ARM version of conda:

# Uninstall Anaconda
conda activate base
conda install anaconda-clean
anaconda-clean --yes
conda deactivate
brew uninstall anaconda

# Download and install Miniforge for ARM (Apple Silicon)
curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh
bash Miniforge3-MacOSX-arm64.sh
# Follow the instructions

conda create --name=datasets python=3.11
conda activate datasets
pip install .

# Test the installation
python -c "import tensorflow"
python -c "import sign_language_datasets"

Usage

We demonstrate a loading script for every dataset in examples/load.ipynb

Our config includes the option to choose the resolution and fps, for example:

import tensorflow_datasets as tfds
import sign_language_datasets.datasets
from sign_language_datasets.datasets.config import SignDatasetConfig

# Loading a dataset with default configuration
aslg_pc12 = tfds.load("aslg_pc12")

# Loading a dataset with custom configuration
config = SignDatasetConfig(name="videos_and_poses256x256:12",
                           version="3.0.0",  # Specific version
                           include_video=True,  # Download and load dataset videos
                           process_video=True,  # Process videos to tensors, or only save path to video
                           fps=12,  # Load videos at constant, 12 fps
                           resolution=(256, 256),  # Convert videos to a constant resolution, 256x256
                           include_pose="holistic")  # Download and load Holistic pose estimation
rwth_phoenix2014_t = tfds.load(name='rwth_phoenix2014_t', builder_kwargs=dict(config=config))

Datasets

Dataset	Videos	Poses	Versions
aslg_pc12	N/A	N/A	0.0.1
asl-lex	No		2.0.0
rwth_phoenix2014_t	Yes	Holistic	3.0.0
autsl	Yes	OpenPose, Holistic	1.0.0
dgs_corpus	Yes	OpenPose, Holistic	3.0.0
dgs_types	Yes		3.0.0
how2sign	Yes	OpenPose	1.0.0
sign2mint	Yes		1.0.0
signtyp	Links		1.0.0
swojs_glossario	Yes		1.0.0
SignBank	N/A		1.0.0
wlasl	Failed	OpenPose	None
wmtslt	Yes	OpenPose, Holistic	1.2.0
signsuisse	Yes		1.0.0
msasl			None
Video-Based CSL			None
RVL-SLLL ASL			None
ngt_corpus	Yes		3.0.0
bsl_corpus	No	No	3.0.0

Data Interface

We follow the following interface wherever possible to make it easy to swap datasets.

{
    "id": tfds.features.Text(),
    "signer": tfds.features.Text() | tf.int32,
    "video": tfds.features.Video(shape=(None, HEIGHT, WIDTH, 3)),
    "depth_video": tfds.features.Video(shape=(None, HEIGHT, WIDTH, 1)),
    "fps": tf.int32,
    "pose": {
        "data": tfds.features.Tensor(shape=(None, 1, POINTS, CHANNELS), dtype=tf.float32),
        "conf": tfds.features.Tensor(shape=(None, 1, POINTS), dtype=tf.float32)
    },
    "gloss": tfds.features.Text(),
    "text": tfds.features.Text()
}

Adding a new dataset

For general instructions, see the TFDS guide to writing custom datasets. Instructions below are specific to this repository.

Make a new folder inside sign_language_datasets/datasets with the same name as the dataset. As a convention, the name of the dataset should be lowercase and words should be separated by an underscore. Example:

cd sign_language_datasets/datasets
tfds new new_dataset

For our purposes, creating a custom TFDS dataset means writing a new class which inherits from tfds.core.GeneratorBasedBuilder. If you use tfds new to create a new dataset then the dataset class is stored in a file with the exact same name as the dataset, i.e. new_dataset.py. new_dataset.py must contain a line similar to:

class NewDataset(tfds.core.GeneratorBasedBuilder):

Registering a new dataset

The mechanism to add a custom dataset to TFDS' dataset registry is to import the class NewDataset. For this reason the folder sign_language_datasets/datasets/new_dataset must have an __init__.py file that imports the class NewDataset:

from .new_dataset import NewDataset

Even though the name of the class is NewDataset, it will be available for loading in lowercase and uppercase characters are interpreted as the start of a new word that should be separated with an underscore. This means that the class can be loaded as follows:

ds = tfds.load('new_dataset')

Generating checksums

The folder for the new dataset should contain a file checksums.tsv with checksums for every file in the dataset. This allows the TFDS download manager to check the integrity of the data it downloads. Use the tfds build tool to generate the checksum file:

tfds build --register_checksums new_dataset.py

Use a dataset configuration which includes all files (e.g. does include the video files if any) using the --config argument. The default behaviour is to build all configurations which might be redundant.

Why not Huggingface Datasets?

Huggingface datasets do not work well with videos. From the lack of native support of the video type, to lack of support of arbitrary tensors. Furthermore, they currently have memory leaks that prevent from saving even the smallest of video datasets.

Cite

@misc{moryossef2021datasets, 
    title={Sign Language Datasets},
    author={Moryossef, Amit and M\"{u}ller, Mathias},
    howpublished={\url{https://github.com/sign-language-processing/datasets}},
    year={2021}
}

Project details

Release history Release notifications | RSS feed

This version

0.4.0

Feb 9, 2026

0.3.0

Aug 9, 2025

0.2.0

Nov 7, 2023

0.1.8

Aug 1, 2023

0.1.7

Mar 30, 2023

0.1.6

Nov 28, 2022

0.1.5

Nov 11, 2022

0.1.4

Oct 2, 2022

0.1.3

Sep 29, 2022

0.1.1

Aug 17, 2022

0.0.13

Aug 11, 2022

0.0.12

Aug 9, 2022

0.0.11

Jul 27, 2022

0.0.10

Jul 27, 2022

0.0.9

Jul 27, 2022

0.0.8

Jul 2, 2022

0.0.7

May 30, 2022

0.0.6

Apr 12, 2022

0.0.5

Apr 8, 2022

0.0.4

Mar 27, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sign_language_datasets-0.4.0.tar.gz (3.5 MB view details)

Uploaded Feb 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sign_language_datasets-0.4.0-py3-none-any.whl (3.6 MB view details)

Uploaded Feb 9, 2026 Python 3

File details

Details for the file sign_language_datasets-0.4.0.tar.gz.

File metadata

Download URL: sign_language_datasets-0.4.0.tar.gz
Upload date: Feb 9, 2026
Size: 3.5 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sign_language_datasets-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`68c5c9a9150c044059f62c08b6e4b409bb04963b10b6372a58e017e8c7c022a0`
MD5	`aa9e3b9c497727ca1fd58ac7b667cac4`
BLAKE2b-256	`86d2896f4046c284d293f11f126c39747c225a4ab6298eebd19b2d9d94d0c9fb`

See more details on using hashes here.

File details

Details for the file sign_language_datasets-0.4.0-py3-none-any.whl.

File metadata

Download URL: sign_language_datasets-0.4.0-py3-none-any.whl
Upload date: Feb 9, 2026
Size: 3.6 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sign_language_datasets-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d1d1d62f04f83d4704916aa776c5849f953ce01a9315e076ab9f0a3c3bbaaa07`
MD5	`4bf811db73e4aa22d887f070af51c65b`
BLAKE2b-256	`8c7901ef65716e7087b4dee6987313e0300b35b0e46da3299270697b00a2bbab`

See more details on using hashes here.

sign-language-datasets 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Sign Language Datasets

Installation

From Source

PyPi

Usage

Datasets

Data Interface

Adding a new dataset

Registering a new dataset

Generating checksums

Why not Huggingface Datasets?

Cite

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes