Skip to main content

Python utilities for SignWriting.

Project description

SignWriting

Python utilities for SignWriting.

Installation

pip install git+https://github.com/sign-language-processing/signwriting

Or with Docker:

docker build --platform linux/amd64 --tag signwriting:python .
docker run --platform linux/amd64 --rm -p 9090:8080 -e PORT=8080 signwriting:python

Utilities

signwriting.formats

This module provides utilities for converting between different formats of SignWriting. We include a few examples:

  1. To parse an FSW string into a Sign object, representing the sign as a dictionary:
from signwriting.formats.fsw_to_sign import fsw_to_sign

fsw_to_sign("M123x456S1f720487x492")
# {'box': {'symbol': 'M', 'position': (123, 456)}, 'symbols': [{'symbol': 'S1f720', 'position': (487, 492)}]}
  1. To convert a SignWriting string in SWU format to FSW format:
from signwriting.formats.swu_to_fsw import swu2fsw

swu2fsw('𝠃𝤟𝤩񋛩𝣵𝤐񀀒𝤇𝣤񋚥𝤐𝤆񀀚𝣮𝣭')
# M525x535S2e748483x510S10011501x466S2e704510x500S10019476x475

signwriting.tokenizer

This module provides utilities for tokenizing SignWriting strings for use in NLP tasks[^1]. We include a few usage non-exhaustive examples:

  1. To tokenize a SignWriting string into a list of tokens:
from signwriting.tokenizer import SignWritingTokenizer

tokenizer = SignWritingTokenizer()

fsw = 'M123x456S1f720487x492S1f720487x492'
tokens = list(tokenizer.text_to_tokens(fsw, box_position=True))
# ['M', 'p123', 'p456', 'S1f7', 'c2', 'r0', 'p487', 'p492', 'S1f7', 'c2', 'r0', 'p487', 'p492'])
  1. To convert a list of tokens back to a SignWriting string:
tokenizer.tokens_to_text(tokens)
# M123x456S1f720487x492S1f720487x492
  1. For machine learning purposes, we can convert the tokens to a list of integers:
tokenizer.tokenize(fsw, bos=False, eos=False)
# [6, 932, 932, 255, 678, 660, 919, 924, 255, 678, 660, 919, 924]
  1. Or to remove 'A' information, and separate signs by spaces, we can use:
from signwriting.tokenizer import normalize_signwriting

normalize_signwriting(fsw)

signwriting.visualizer

This module is used to visualize SignWriting strings as images. Unlike sutton-signwriting/font-db which it is based on, this module does not support custom styling. Benchmarks show that this module is ~5000x faster than the original implementation.

from signwriting.visualizer.visualize import signwriting_to_image

fsw = "AS10011S10019S2e704S2e748M525x535S2e748483x510S10011501x466S20544510x500S10019476x475"
signwriting_to_image(fsw)

AS10011S10019S2e704S2e748M525x535S2e748483x510S10011501x466S20544510x500S10019476x475

To use the visualizer with the server, you can hit: https://signwriting-sxie2r74ua-uc.a.run.app/visualizer?fsw=M525x535S2e748483x510S10011501x466S2e704510x500S10019476x475

signwriting.utils

This module includes general utilities that were not covered in the other modules.

  1. join_signs joins a list of signs into a single sign. This is useful for example for fingerspelling words out of individual character signs.
from signwriting.utils.join_signs import join_signs_vertical

char_a = 'M507x507S1f720487x492'
char_b = 'M507x507S14720493x485'
result_sign = join_signs_vertical(char_a, char_b)
# M510x518S1f720490x481S14720496x496

signwriting.fingerspelling

This module is used to generate spelling data from a list of characters.

from signwriting.fingerspelling.fingerspelling import spell

word = "Hello"  # any string of characters
language = "en-us-ase-asl"  # long language code, as defined in the fingerspelling README
spell(word, language)
# M515x563S11502477x437S14a20492x457S1dc20484x477S1dc20484x512S17620492x547

To use the fingerspelling with the server, you can hit: https://signwriting-sxie2r74ua-uc.a.run.app/fingerspelling?text=hello&signed_language=ase

signwriting.mouthing

This module is used to generate SpeechWriting from spoken words.

from signwriting.mouthing.mouthing import mouth

word = "Hello"  # any string of characters, preferably valid words
language = "eng-Latn"  # supported languages under "Language Support" at https://pypi.org/project/epitran/
mouth(word, language)
# M557x518S34700443x482S35c00469x482S34400495x482S34d00521x482

Note: Installing English support for epitran requires extra steps, see "Install flite" at mouthing/README.md.

To use the mouthing with the server, you can hit: https://signwriting-sxie2r74ua-uc.a.run.app/mouthing?text=hello&spoken_language=eng-Latn

Cite

@misc{moryossef2024-signwriting, 
    title={Utilities for SignWriting},
    author={Moryossef, Amit},
    howpublished={\url{https://github.com/sign-language-processing/signwriting}},
    year={2024}
}

References

[^1]: Amit Moryossef, Zifan Jiang.

  1. SignBank+: Preparing a Multilingual Sign Language Dataset for Machine Translation Using Large Language Models.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

signwriting-0.1.7.tar.gz (6.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

signwriting-0.1.7-py3-none-any.whl (6.6 MB view details)

Uploaded Python 3

File details

Details for the file signwriting-0.1.7.tar.gz.

File metadata

  • Download URL: signwriting-0.1.7.tar.gz
  • Upload date:
  • Size: 6.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for signwriting-0.1.7.tar.gz
Algorithm Hash digest
SHA256 27895bde1adf6a78cdb6956cc50a2257728bdf37771a7d6a034be791a4b7f302
MD5 8d0705f2ef29f85ff701863d634c57db
BLAKE2b-256 ca17f79b00fec67b7585a7b23ed72c91253c4ea99e5bf676ed35335fd79fa0eb

See more details on using hashes here.

Provenance

The following attestation bundles were made for signwriting-0.1.7.tar.gz:

Publisher: release.yaml on sign-language-processing/signwriting

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file signwriting-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: signwriting-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 6.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for signwriting-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 e81e339bf0c8f18d8f1f892605d0c6f977f6f555d5bb0c0e075d79689ff6d378
MD5 799abf87245bae689a67594296502a94
BLAKE2b-256 cbac0e6b37dc28baa5f2c65fc14d1d30c289fb0f0e5378725628f446cc246dab

See more details on using hashes here.

Provenance

The following attestation bundles were made for signwriting-0.1.7-py3-none-any.whl:

Publisher: release.yaml on sign-language-processing/signwriting

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page