Multilanguage datasets for Sign Language Translation.
Project description
SLT Datasets Downloader
Overview
SLT Datasets Downloader is a Python library that allows users to download and process multiple sign language translation (SLT) datasets from different languages. It is designed to facilitate the training of machine learning models for SLT tasks.
Features
- Supports multiple sign language datasets.
- Provides tools for downloading, preprocessing, and tokenizing datasets.
- Compatible with PyTorch for model training.
- Handles different input types (video, pose) and output types (text, gloss).
Supported Datasets
The following datasets are supported:
| Name | Input Language | Target Language | Status | # Samples | Hs | Video | Pose | Transcription | Gloss | Other data | # Words | # Singletons | # Signers | Source | BLEU (4) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| RWTH-PHOENIX Weather 2014 T | GSL | German | Downloaded | 7,096 | 10 | Yes | No | Yes | Yes | - | 2,887 | 1,077 | 9 | TV | 28.95 |
| LSA-T | LSA | Spanish | Downloaded | 14,880 | 22 | Yes | Yes (MP) | Yes | No | - | 14,239 | 7,150 | 103 | Web | 0.93 (WER) |
| How2Sign | ASL | English | Downloaded | 35,000 | 80 | Yes | Yes | Yes | Yes | Multiple angles, 3D pose, Speech | 16,000 | - | 11 | - | 8.03 |
| LSFB-CONT | French-Belgian SL | French-Belgian Glosses | Downloaded | 85,000 | 25 | Yes | Yes (MP) | No | Yes | - | 6,883 | - | 100 | Laboratory | - |
| ISLTranslate | Indian SL | English | Downloaded | 31,222 | - | Yes | Yes (MP) | Yes | No | - | 11,655 | - | - | - | 6.09 |
| GSL | Greek SL (GSL) | Greek | Downloaded | 40,826 | 10 | Yes | No | No | Yes | Depth | 310 | 0 | 7 | Laboratory | 20.62 (WER) |
Installation
To install the library and its dependencies, run:
pip install -r requirements.txt
Usage
Loading a Dataset
from slt_datasets.SLTDataset import SLTDataset
dataset = SLTDataset(
data_dir="/path/to/dataset",
input_mode="video",
output_mode="text",
split="train"
)
Accessing Samples
sample_input, sample_output = dataset[0]
Visualizing Pose Data
dataset.visualize_pose(0)
Project Structure
|-src
| |-slt_datasets
| | |-SLTDataset.py # Main dataset loader
| | |-WordLevelTokenizer.py # Tokenizer for text processing
| | |-dataset_comparison.ipynb # Notebook for dataset comparison
|-tests
| |-test_methods.py # Unit tests for the dataset loader
|-docs # Documentation files
|-requirements.txt # Dependency list
|-pyproject.toml # Project configuration
|-README.md # This file
Contributing
Contributions are welcome! Please follow the guidelines in CONTRIBUTING.md and ensure your code passes all tests before submitting a pull request.
License
This project is licensed under the terms of the LICENSE file.
Support
For any issues or questions, please refer to SUPPORT.md or open an issue in the repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file slt_datasets-0.0.2rc45.post1.tar.gz.
File metadata
- Download URL: slt_datasets-0.0.2rc45.post1.tar.gz
- Upload date:
- Size: 255.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.32.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8611a5de20f0ee0077ceb77613cb8dcea9e4aa3d8d94e8607073ea76773819a1
|
|
| MD5 |
addabc6280d9b80ab2f49132b5caadf2
|
|
| BLAKE2b-256 |
2d48504a0234f66d4bb8096a5581216ca16f8c2627a46ca91cf26db793e0d100
|
File details
Details for the file slt_datasets-0.0.2rc45.post1-py3-none-any.whl.
File metadata
- Download URL: slt_datasets-0.0.2rc45.post1-py3-none-any.whl
- Upload date:
- Size: 238.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.32.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5772beb06326bfa8f9c4689ed8b0d0b5cfac578bf49211efe128577ce935dfcd
|
|
| MD5 |
79014da935b745aac1e0068babb94533
|
|
| BLAKE2b-256 |
a3b7e5f48f6f4abb8eaf4550639de8458485c5be13dfa3671429e18b17f32b28
|