Skip to main content

Arabic shaping, BiDi, and un-baking for games, TTS, and real-time clients.

Project description

arabic-rt

Arabic shaping, BiDi, and un-baking for games, TTS, and real-time clients.

Most Arabic libraries can turn logical Arabic into correctly shaped, right-to-left text. arabic-rt does that too — but it also does the part almost nothing else does: it can reverse the process, turning baked presentation-form text back into clean logical Arabic. That round-trip is what makes Arabic work in places it normally breaks: multiplayer game chat, naive text renderers, and text-to-speech.

  • 🔁 Bake and un-bake. fix() → renders correctly even on clients that do zero Arabic processing. unfix() → recovers logical Arabic for TTS, search, or logging.
  • 🎮 Built for real-time clients. A GAME preset handles word-by-word chat readers (joins words so they aren't split, keeps the first words on top when wrapping).
  • 🧩 Zero dependencies. Pure Python. Drop it in anywhere.
  • Validated. Forward output matches arabic_reshaper + python-bidi byte-for-byte; unfix(fix(x)) == x is covered by tests.

Pure shaping/BiDi is well served by existing tools. arabic-rt's reason to exist is the real-time / game niche and the un-baking capability built for it.

Install

pip install arabic-rt

Quick start

import arabic_rt as ar

baked = ar.fix("مرحبا بالعالم")     # visual-order presentation forms (renders anywhere)
ar.unfix(baked)                      # -> "مرحبا بالعالم"  (back to logical, for TTS/search)
ar.shape("سلم")                      # -> "ﺳﻠﻢ"  (contextual shaping only, no reorder)

ar.contains_arabic("hi مرحبا")       # True
ar.is_shaped(baked)                  # True

Game chat (word-by-word readers)

ar.fix("مرحبا بالعالم", ar.GAME)     # words joined so a naive reader shows the whole phrase

Tune it yourself

from arabic_rt import Options, fix

opts = Options(
    combine_allah=True,      # collapse الله -> ﷲ
    reverse_word_order=True, # full RTL line (False = shape per word, keep typed order)
    word_joiner="\u00A0",    # separator for naive word-by-word readers
    prevent_word_split=True,
    max_line_chars=18,       # wrap long lines ourselves (first words on top, each line RTL)
)
fix("نص عربي طويل", opts)

Why "un-baking" matters

To make Arabic show up correctly on a client that does no shaping, you "bake" it into final presentation glyphs in visual (reversed) order. The catch: once baked, the text is no longer real Arabic letters — so a text-to-speech engine reads gibberish, and search/logging break. unfix() reverses the bake (presentation forms → base letters, ligatures expanded, order restored) so the display can stay baked while the voice and data see clean Arabic.

API

Function Purpose
fix(text, opts=None, **overrides) Logical Arabic → baked visual presentation forms. No-op on non-Arabic or already-shaped text.
unfix(text) Baked Arabic → logical Arabic. No-op on text that isn't baked.
shape(text, *, combine_allah=False) Contextual shaping only; order preserved.
contains_arabic(text) / is_shaped(text) Fast checks.
Options / GAME Config dataclass and a ready preset for game chat.

A note on display fonts

arabic-rt produces correct text; how it looks is your font's job. For rendering shaped Arabic (e.g. in the demo or a UI), a quality Naskh face such as Noto Naskh Arabic or Amiri (both SIL OFL) looks far better than a generic system font.

Validation

Run the suite (installs the reference libraries as dev extras):

pip install -e ".[dev]"
pytest -q

License & author

Licensed under the Mozilla Public License 2.0 (MPL-2.0) — see LICENSE. Use it freely, including in closed-source games and apps; modifications to arabic-rt's own files stay open.

Created by Bandar AlSwyan.


العربية — نظرة سريعة

arabic-rt مكتبة لمعالجة النص العربي؛ فهي تدعم تشكيل الحروف وربطها بأشكالها الصحيحة، وترتيبها من اليمين إلى اليسار. والأهم من ذلك أنها تدعم عكس العملية، أي تحويل النص «المخبوز» — أشكال العرض المقلوبة — مرة أخرى إلى نص عربي منطقي وسليم.

هذه القدرة على «فك الخبز» (unfix) هي ما يجعل العربية تعمل في الأماكن التي تتعطّل فيها عادةً، مثل دردشات الألعاب الجماعية، والمحرّكات التي لا تعالج العربية، وأنظمة النطق الآلي (TTS). وبذلك يظهر النص بشكل صحيح للجميع، بينما يقرأ محرّك الصوت أو البحث نسخة منطقية ونظيفة.

  • fix(): يحوّل العربية المنطقية إلى أشكال عرض جاهزة تظهر بشكل صحيح على أي عميل، حتى بدون معالجة عربية.
  • unfix(): يعكس العملية لاستعادة العربية المنطقية، لاستخدامها في النطق والبحث والسجلات.
  • GAME: إعداد جاهز لدردشات الألعاب التي تقرأ الكلمات واحدة تلو الأخرى.
  • بدون أي اعتماديات، ومُتحقَّق منها مقابل arabic_reshaper و python-bidi حرفًا بحرف.

مرخّصة بموجب رخصة MPL-2.0. من إعداد بندر الصويان.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arabic_rt-0.1.0.tar.gz (15.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arabic_rt-0.1.0-py3-none-any.whl (15.2 kB view details)

Uploaded Python 3

File details

Details for the file arabic_rt-0.1.0.tar.gz.

File metadata

  • Download URL: arabic_rt-0.1.0.tar.gz
  • Upload date:
  • Size: 15.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arabic_rt-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6f2eb81048507af61f776bd972b0f2b3a32c5541be1de34c5863d3537d38b838
MD5 84f43c4713e1e47c73943f1ef074aae9
BLAKE2b-256 f45cb4d85384daf6c77ee511a0c170cc6af030fbca325c9a29e0936a990ea30c

See more details on using hashes here.

Provenance

The following attestation bundles were made for arabic_rt-0.1.0.tar.gz:

Publisher: publish.yml on balswyan/arabic-rt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arabic_rt-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: arabic_rt-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arabic_rt-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c0d19899d40385d685ed56c5ec42377eb5fd19801b2ee5c964b214cda4303420
MD5 b11e4120f9eb0f4bc9153ca218c5b421
BLAKE2b-256 50728a2c7ab584531239139d9d5d687f3e58dca57d9dd0f6b281a5f01ff22b46

See more details on using hashes here.

Provenance

The following attestation bundles were made for arabic_rt-0.1.0-py3-none-any.whl:

Publisher: publish.yml on balswyan/arabic-rt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page