Skip to main content

Multilanguage datasets for Sign Language Translation.

Project description

SLT Datasets Downloader

Overview

SLT Datasets Downloader is a Python library that allows users to download and process multiple sign language translation (SLT) datasets from different languages. It is designed to facilitate the training of machine learning models for SLT tasks.

Features

  • Supports multiple sign language datasets.
  • Provides tools for downloading, preprocessing, and tokenizing datasets.
  • Compatible with PyTorch for model training.
  • Handles different input types (video, pose) and output types (text, gloss).

Supported Datasets

The following datasets are supported:

Name Input Language Target Language Status # Samples Hs Video Pose Transcription Gloss Other data # Words # Singletons # Signers Source BLEU (4)
RWTH-PHOENIX Weather 2014 T GSL German Downloaded 7,096 10 Yes No Yes Yes - 2,887 1,077 9 TV 28.95
LSA-T LSA Spanish Downloaded 14,880 22 Yes Yes (MP) Yes No - 14,239 7,150 103 Web 0.93 (WER)
How2Sign ASL English Downloaded 35,000 80 Yes Yes Yes Yes Multiple angles, 3D pose, Speech 16,000 - 11 - 8.03
LSFB-CONT French-Belgian SL French-Belgian Glosses Downloaded 85,000 25 Yes Yes (MP) No Yes - 6,883 - 100 Laboratory -
ISLTranslate Indian SL English Downloaded 31,222 - Yes Yes (MP) Yes No - 11,655 - - - 6.09
GSL Greek SL (GSL) Greek Downloaded 40,826 10 Yes No No Yes Depth 310 0 7 Laboratory 20.62 (WER)

Installation

To install the library and its dependencies, run:

pip install -r requirements.txt

Usage

Loading a Dataset

from slt_datasets.SLTDataset import SLTDataset

dataset = SLTDataset(
    data_dir="/path/to/dataset",
    input_mode="video",
    output_mode="text",
    split="train"
)

Accessing Samples

sample_input, sample_output = dataset[0]

Visualizing Pose Data

dataset.visualize_pose(0)

Project Structure

 |-src
 | |-slt_datasets
 | | |-SLTDataset.py  # Main dataset loader
 | | |-WordLevelTokenizer.py  # Tokenizer for text processing
 | | |-dataset_comparison.ipynb  # Notebook for dataset comparison
 |-tests
 | |-test_methods.py  # Unit tests for the dataset loader
 |-docs  # Documentation files
 |-requirements.txt  # Dependency list
 |-pyproject.toml  # Project configuration
 |-README.md  # This file

Contributing

Contributions are welcome! Please follow the guidelines in CONTRIBUTING.md and ensure your code passes all tests before submitting a pull request.

License

This project is licensed under the terms of the LICENSE file.

Support

For any issues or questions, please refer to SUPPORT.md or open an issue in the repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

slt_datasets-0.0.2rc45.post1.tar.gz (255.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

slt_datasets-0.0.2rc45.post1-py3-none-any.whl (238.4 kB view details)

Uploaded Python 3

File details

Details for the file slt_datasets-0.0.2rc45.post1.tar.gz.

File metadata

  • Download URL: slt_datasets-0.0.2rc45.post1.tar.gz
  • Upload date:
  • Size: 255.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.3

File hashes

Hashes for slt_datasets-0.0.2rc45.post1.tar.gz
Algorithm Hash digest
SHA256 8611a5de20f0ee0077ceb77613cb8dcea9e4aa3d8d94e8607073ea76773819a1
MD5 addabc6280d9b80ab2f49132b5caadf2
BLAKE2b-256 2d48504a0234f66d4bb8096a5581216ca16f8c2627a46ca91cf26db793e0d100

See more details on using hashes here.

File details

Details for the file slt_datasets-0.0.2rc45.post1-py3-none-any.whl.

File metadata

File hashes

Hashes for slt_datasets-0.0.2rc45.post1-py3-none-any.whl
Algorithm Hash digest
SHA256 5772beb06326bfa8f9c4689ed8b0d0b5cfac578bf49211efe128577ce935dfcd
MD5 79014da935b745aac1e0068babb94533
BLAKE2b-256 a3b7e5f48f6f4abb8eaf4550639de8458485c5be13dfa3671429e18b17f32b28

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page