Skip to main content

Python utilities for SignWriting.

Project description

SignWriting

Python utilities for SignWriting.

Installation

pip install git+https://github.com/sign-language-processing/signwriting

Or with Docker:

docker build --platform linux/amd64 --tag signwriting:python .
docker run --platform linux/amd64 --rm -p 9090:8080 -e PORT=8080 signwriting:python

Utilities

signwriting.formats

This module provides utilities for converting between different formats of SignWriting. We include a few examples:

  1. To parse an FSW string into a Sign object, representing the sign as a dictionary:
from signwriting.formats.fsw_to_sign import fsw_to_sign

fsw_to_sign("M123x456S1f720487x492")
# {'box': {'symbol': 'M', 'position': (123, 456)}, 'symbols': [{'symbol': 'S1f720', 'position': (487, 492)}]}
  1. To convert a SignWriting string in SWU format to FSW format:
from signwriting.formats.swu_to_fsw import swu2fsw

swu2fsw('𝠃𝤟𝤩񋛩𝣵𝤐񀀒𝤇𝣤񋚥𝤐𝤆񀀚𝣮𝣭')
# M525x535S2e748483x510S10011501x466S2e704510x500S10019476x475

signwriting.tokenizer

This module provides utilities for tokenizing SignWriting strings for use in NLP tasks[^1]. We include a few usage non-exhaustive examples:

  1. To tokenize a SignWriting string into a list of tokens:
from signwriting.tokenizer import SignWritingTokenizer

tokenizer = SignWritingTokenizer()

fsw = 'M123x456S1f720487x492S1f720487x492'
tokens = list(tokenizer.text_to_tokens(fsw, box_position=True))
# ['M', 'p123', 'p456', 'S1f7', 'c2', 'r0', 'p487', 'p492', 'S1f7', 'c2', 'r0', 'p487', 'p492'])
  1. To convert a list of tokens back to a SignWriting string:
tokenizer.tokens_to_text(tokens)
# M123x456S1f720487x492S1f720487x492
  1. For machine learning purposes, we can convert the tokens to a list of integers:
tokenizer.tokenize(fsw, bos=False, eos=False)
# [6, 932, 932, 255, 678, 660, 919, 924, 255, 678, 660, 919, 924]
  1. Or to remove 'A' information, and separate signs by spaces, we can use:
from signwriting.tokenizer import normalize_signwriting

normalize_signwriting(fsw)

signwriting.visualizer

This module is used to visualize SignWriting strings as images. Unlike sutton-signwriting/font-db which it is based on, this module does not support custom styling. Benchmarks show that this module is ~5000x faster than the original implementation.

from signwriting.visualizer.visualize import signwriting_to_image

fsw = "AS10011S10019S2e704S2e748M525x535S2e748483x510S10011501x466S20544510x500S10019476x475"
signwriting_to_image(fsw)

AS10011S10019S2e704S2e748M525x535S2e748483x510S10011501x466S20544510x500S10019476x475

To use the visualizer with the server, you can hit: https://signwriting-sxie2r74ua-uc.a.run.app/visualizer?fsw=M525x535S2e748483x510S10011501x466S2e704510x500S10019476x475

signwriting.utils

This module includes general utilities that were not covered in the other modules.

  1. join_signs joins a list of signs into a single sign. This is useful for example for fingerspelling words out of individual character signs.
from signwriting.utils.join_signs import join_signs_vertical

char_a = 'M507x507S1f720487x492'
char_b = 'M507x507S14720493x485'
result_sign = join_signs_vertical(char_a, char_b)
# M510x518S1f720490x481S14720496x496

signwriting.fingerspelling

This module is used to generate spelling data from a list of characters.

from signwriting.fingerspelling.fingerspelling import spell

word = "Hello"  # any string of characters
language = "en-us-ase-asl"  # long language code, as defined in the fingerspelling README
spell(word, language)
# M515x563S11502477x437S14a20492x457S1dc20484x477S1dc20484x512S17620492x547

To use the fingerspelling with the server, you can hit: https://signwriting-sxie2r74ua-uc.a.run.app/fingerspelling?text=hello&signed_language=ase

signwriting.mouthing

This module is used to generate SpeechWriting from spoken words.

from signwriting.mouthing.mouthing import mouth

word = "Hello"  # any string of characters, preferably valid words
language = "eng-Latn"  # supported languages under "Language Support" at https://pypi.org/project/epitran/
mouth(word, language)
# M557x518S34700443x482S35c00469x482S34400495x482S34d00521x482

Note: Installing English support for epitran requires extra steps, see "Install flite" at mouthing/README.md.

To use the mouthing with the server, you can hit: https://signwriting-sxie2r74ua-uc.a.run.app/mouthing?text=hello&spoken_language=eng-Latn

Cite

@misc{moryossef2024-signwriting, 
    title={Utilities for SignWriting},
    author={Moryossef, Amit},
    howpublished={\url{https://github.com/sign-language-processing/signwriting}},
    year={2024}
}

References

[^1]: Amit Moryossef, Zifan Jiang.

  1. SignBank+: Preparing a Multilingual Sign Language Dataset for Machine Translation Using Large Language Models.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

signwriting-0.1.3.tar.gz (6.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

signwriting-0.1.3-py3-none-any.whl (6.6 MB view details)

Uploaded Python 3

File details

Details for the file signwriting-0.1.3.tar.gz.

File metadata

  • Download URL: signwriting-0.1.3.tar.gz
  • Upload date:
  • Size: 6.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for signwriting-0.1.3.tar.gz
Algorithm Hash digest
SHA256 085068f11d48746aaffc0be9a0dc567d70a50f1e6e431169c9c1de2f960259fe
MD5 dd98b794bbafbbd8fd529e935b3a94f8
BLAKE2b-256 f94b5506bb30b7582502204cecbfa612f5df7d984640893ef2e94c5fa21ec29f

See more details on using hashes here.

Provenance

The following attestation bundles were made for signwriting-0.1.3.tar.gz:

Publisher: release.yaml on sign-language-processing/signwriting

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file signwriting-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: signwriting-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 6.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for signwriting-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 e06db08edf32b59d1444040e2be756fba697681474e9366c2390804b7b108559
MD5 98eee5fea6c2e34d3602ad47b221a824
BLAKE2b-256 8a4aca812b2c38553fc613ae38a67d893c7522bbf28af39a502ec7dde4106ff8

See more details on using hashes here.

Provenance

The following attestation bundles were made for signwriting-0.1.3-py3-none-any.whl:

Publisher: release.yaml on sign-language-processing/signwriting

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page