Skip to main content

A small example package

Project description

LSA-T: The first continuous LSA dataset

LSA-T is the first continuous Argentinian Sign Language (LSA) dataset. It contains 14,880 sentence level videos of LSA extracted from the CN Sordos YouTube channel with labels and keypoints annotations for each signer. Videos are in 30 FPS full HD (1920x1080).

Format

Samples are organized in directories according to the playlists and video they belong to. For each sample i there are four files:

  • i.mp4: the clip corresponding to the ith line of subtitles.
  • i.json contains:
    • label: the line of subtitles corresponding to the clip.
    • start: time in seconds where the subtitle starts.
    • end: time in seconds where the subtitle ends.
    • video: title of the video which the clip belongs to.
    • playlist: title of the playlist which the clip belongs to.
  • i_ap.json: the raw AlphaPose results over the clip using Halpe KeyPoints in AlphaPose default output format.
  • i_signer.json contains:
    • scores: for each person in the clip, the amount of "movement" in its hands. It is used to infer who is the signer.
    • roi: the considered region of interest of the clip (bounding box of the infered signer).
    • keypoints: list of keypoints for each frame of the infered signer in same format that in i_ap.json.

Usage

This repository can be installed via pip and contains the LSA_Dataset class (in lsat.dataset.LSA_Dataset module). This class inherits from the Pytorch dataset class and implements all necessary methods for using it with a Pytorch dataloader. It also manages the downloading and extraction of the database.

Also, useful transforms for the clips and keypoints are provided in lsat.dataset.transforms

Statistics and comparison with other DBs

LSA-T PHOENIX* SIGNUM CSL GSL KETI
language Spanish German German Chinese Greek Korean
sign language LSA GSL GSL CSL GSL KLS
real life Yes Yes No No No No
signers 103 9 25 50 7 14
duration (h) 21.78 10.71 55.3 100+ 9.51 28
# samples 14,880 7096 33,210 25,000 10,295 14,672
# unique sentences 14,254 5672 780 100 331 105
% unique sentences 95.79% 79.93% 2.35% 0.4% 3.21% 0.71%
vocab. size (w) 14,239 2887 N/A 178 N/A 419
# singletons (w) 7150 1077 0 0 0 0
% singletons (w) 50.21% 37.3% 0% 0% 0% 0%
vocab. size (gl) - 1066 450 - 310 524
# singletons (gl) - 337 0 - 0 0
# singletons (gl) - 31.61% 0% - 0% 0%
resolution 1920x1080 210x260 776x578 1920x1080 848x480 1920x1080
fps 30 25 30 30 30 30

*Data was not available for the whole PHOENIX dataset, so the table show its train set statistics.

Evaluation splits

LSA-T Full version Reduced version
Train Test Train Test
signers 103 X X X X
duration [h] 21.78 17.49 4.29 15.85 3.89
# sentences 14,880 11,065 2735 3767 910
% unique sentences 95.79% 96.64% 92.78% 96.88% 98.35%
vocab. size 14,239 12,385 5546 2694 1579
% singletons 50.21% 52.01% 61.9% 23.2% 48.83%
% sentences with singletons 34.97% 40.98% 67.97% 14.36% 54.29%
% sentences with words not in train vocabulary - - 59.2% - 84.5%

Citation

TO-DO

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lsat-0.0.1.tar.gz (4.4 kB view details)

Uploaded Source

Built Distribution

lsat-0.0.1-py3-none-any.whl (4.3 kB view details)

Uploaded Python 3

File details

Details for the file lsat-0.0.1.tar.gz.

File metadata

  • Download URL: lsat-0.0.1.tar.gz
  • Upload date:
  • Size: 4.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.8

File hashes

Hashes for lsat-0.0.1.tar.gz
Algorithm Hash digest
SHA256 f9002b4bf6906b649bfc868225141752ae9746d67d251d61a4bf0ea431097a4f
MD5 a612d9f988c67117c5bbb675067e00fb
BLAKE2b-256 54cd3ee3006e5ee4d122914cb150304af46e47e4d055d7d09e5aca60b3f44f62

See more details on using hashes here.

File details

Details for the file lsat-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: lsat-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 4.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.8

File hashes

Hashes for lsat-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b4e084912fb7535601d96d55b2f5bcc23b18f049e4170a12677eee3f4c7b7a17
MD5 33bb249e1fa8fdaea1a2fcbc9148b2c8
BLAKE2b-256 56b26036ea71ece98dc3ce12fd56c9a8934f3732fc6cbf6c7af55d4561107c30

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page