Skip to main content

Python utilities for SignWriting.

Project description

SignWriting

Python utilities for SignWriting.

Installation

pip install git+https://github.com/sign-language-processing/signwriting

Or with Docker:

docker build --platform linux/amd64 --tag signwriting:python .
docker run --platform linux/amd64 --rm -p 9090:8080 -e PORT=8080 signwriting:python

Utilities

signwriting.formats

This module provides utilities for converting between different formats of SignWriting. We include a few examples:

  1. To parse an FSW string into a Sign object, representing the sign as a dictionary:
from signwriting.formats.fsw_to_sign import fsw_to_sign

fsw_to_sign("M123x456S1f720487x492")
# {'box': {'symbol': 'M', 'position': (123, 456)}, 'symbols': [{'symbol': 'S1f720', 'position': (487, 492)}]}
  1. To convert a SignWriting string in SWU format to FSW format:
from signwriting.formats.swu_to_fsw import swu2fsw

swu2fsw('𝠃𝤟𝤩񋛩𝣵𝤐񀀒𝤇𝣤񋚥𝤐𝤆񀀚𝣮𝣭')
# M525x535S2e748483x510S10011501x466S2e704510x500S10019476x475

signwriting.tokenizer

This module provides utilities for tokenizing SignWriting strings for use in NLP tasks[^1]. We include a few usage non-exhaustive examples:

  1. To tokenize a SignWriting string into a list of tokens:
from signwriting.tokenizer import SignWritingTokenizer

tokenizer = SignWritingTokenizer()

fsw = 'M123x456S1f720487x492S1f720487x492'
tokens = list(tokenizer.text_to_tokens(fsw, box_position=True))
# ['M', 'p123', 'p456', 'S1f7', 'c2', 'r0', 'p487', 'p492', 'S1f7', 'c2', 'r0', 'p487', 'p492'])
  1. To convert a list of tokens back to a SignWriting string:
tokenizer.tokens_to_text(tokens)
# M123x456S1f720487x492S1f720487x492
  1. For machine learning purposes, we can convert the tokens to a list of integers:
tokenizer.tokenize(fsw, bos=False, eos=False)
# [6, 932, 932, 255, 678, 660, 919, 924, 255, 678, 660, 919, 924]
  1. Or to remove 'A' information, and separate signs by spaces, we can use:
from signwriting.tokenizer import normalize_signwriting

normalize_signwriting(fsw)

signwriting.visualizer

This module is used to visualize SignWriting strings as images. Unlike sutton-signwriting/font-db which it is based on, this module does not support custom styling. Benchmarks show that this module is ~5000x faster than the original implementation.

from signwriting.visualizer.visualize import signwriting_to_image

fsw = "AS10011S10019S2e704S2e748M525x535S2e748483x510S10011501x466S20544510x500S10019476x475"
signwriting_to_image(fsw)

AS10011S10019S2e704S2e748M525x535S2e748483x510S10011501x466S20544510x500S10019476x475

To use the visualizer with the server, you can hit: https://signwriting-sxie2r74ua-uc.a.run.app/visualizer?fsw=M525x535S2e748483x510S10011501x466S2e704510x500S10019476x475

signwriting.utils

This module includes general utilities that were not covered in the other modules.

  1. join_signs joins a list of signs into a single sign. This is useful for example for fingerspelling words out of individual character signs.
from signwriting.utils.join_signs import join_signs_vertical

char_a = 'M507x507S1f720487x492'
char_b = 'M507x507S14720493x485'
result_sign = join_signs_vertical(char_a, char_b)
# M510x518S1f720490x481S14720496x496

signwriting.fingerspelling

This module is used to generate spelling data from a list of characters.

from signwriting.fingerspelling.fingerspelling import spell

word = "Hello"  # any string of characters
language = "en-us-ase-asl"  # long language code, as defined in the fingerspelling README
spell(word, language)
# M515x563S11502477x437S14a20492x457S1dc20484x477S1dc20484x512S17620492x547

To use the fingerspelling with the server, you can hit: https://signwriting-sxie2r74ua-uc.a.run.app/fingerspelling?text=hello&signed_language=ase

signwriting.mouthing

This module is used to generate SpeechWriting from spoken words.

from signwriting.mouthing.mouthing import mouth

word = "Hello"  # any string of characters, preferably valid words
language = "eng-Latn"  # supported languages under "Language Support" at https://pypi.org/project/epitran/
mouth(word, language)
# M557x518S34700443x482S35c00469x482S34400495x482S34d00521x482

Note: Installing English support for epitran requires extra steps, see "Install flite" at mouthing/README.md.

To use the mouthing with the server, you can hit: https://signwriting-sxie2r74ua-uc.a.run.app/mouthing?text=hello&spoken_language=eng-Latn

Cite

@misc{moryossef2024-signwriting, 
    title={Utilities for SignWriting},
    author={Moryossef, Amit},
    howpublished={\url{https://github.com/sign-language-processing/signwriting}},
    year={2024}
}

References

[^1]: Amit Moryossef, Zifan Jiang.

  1. SignBank+: Preparing a Multilingual Sign Language Dataset for Machine Translation Using Large Language Models.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

signwriting-0.1.2.tar.gz (6.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

signwriting-0.1.2-py3-none-any.whl (6.6 MB view details)

Uploaded Python 3

File details

Details for the file signwriting-0.1.2.tar.gz.

File metadata

  • Download URL: signwriting-0.1.2.tar.gz
  • Upload date:
  • Size: 6.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for signwriting-0.1.2.tar.gz
Algorithm Hash digest
SHA256 724a40e7e08a377ec237f2e469f9048f9a0b547fa2e1e9956056e13aeff4a631
MD5 c026b1fecd69b036b2097b8934d5c26e
BLAKE2b-256 c3eb6e25bb91d383a164c4de2c888fce6c4fd3ccbcc90fa6e6a70d5ef0cfe5d4

See more details on using hashes here.

Provenance

The following attestation bundles were made for signwriting-0.1.2.tar.gz:

Publisher: release.yaml on sign-language-processing/signwriting

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file signwriting-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: signwriting-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 6.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for signwriting-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 9332ed24bfbb53c5c517c581ace6e9c50f5bcf459fcdeeb37811927b7920fd97
MD5 d0638a495c2fadebaa9f44230348ca4b
BLAKE2b-256 7cbf05369dec14b10858fc8072e9e367856ce08c6e076abe8e222aa69fc935c5

See more details on using hashes here.

Provenance

The following attestation bundles were made for signwriting-0.1.2-py3-none-any.whl:

Publisher: release.yaml on sign-language-processing/signwriting

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page