Skip to main content

Arabic shaping, BiDi, and un-baking for games, TTS, and real-time clients.

Project description

arabic-rt

Arabic shaping, BiDi, and un-baking for games, TTS, and real-time clients.

PyPI Python License: MPL-2.0 Open in Spaces

🤗 Try the live demo · 📦 PyPI · 🎮 .NET / Unity version

Most Arabic libraries can turn logical Arabic into correctly shaped, right-to-left text. arabic-rt does that too — but it also does the part almost nothing else does: it can reverse the process, turning baked presentation-form text back into clean logical Arabic. That round-trip is what makes Arabic work in places it normally breaks: multiplayer game chat, naive text renderers, and text-to-speech.

  • 🔁 Bake and un-bake. fix() → renders correctly even on clients that do zero Arabic processing. unfix() → recovers logical Arabic for TTS, search, or logging.
  • 🎮 Built for real-time clients. A GAME preset handles word-by-word chat readers (joins words so they aren't split, keeps the first words on top when wrapping).
  • 🧩 Zero dependencies. Pure Python. Drop it in anywhere.
  • Validated. Forward output matches arabic_reshaper + python-bidi byte-for-byte; unfix(fix(x)) == x is covered by tests. The C# port produces byte-identical output, so text baked in Unity reads back in Python and vice-versa.

Pure shaping/BiDi is well served by existing tools. arabic-rt's reason to exist is the real-time / game niche and the un-baking capability built for it.

Try it

A live, no-install demo — type Arabic and watch it shaped, baked, and un-baked in real time: https://huggingface.co/spaces/balswyan/arabic-rt

Install

pip install arabic-rt

Quick start

import arabic_rt as ar

baked = ar.fix("مرحبا بالعالم")     # visual-order presentation forms (renders anywhere)
ar.unfix(baked)                      # -> "مرحبا بالعالم"  (back to logical, for TTS/search)
ar.shape("سلم")                      # -> "ﺳﻠﻢ"  (contextual shaping only, no reorder)

ar.contains_arabic("hi مرحبا")       # True
ar.is_shaped(baked)                  # True

Game chat (word-by-word readers)

ar.fix("مرحبا بالعالم", ar.GAME)     # words joined so a naive reader shows the whole phrase

Tune it yourself

from arabic_rt import Options, fix

opts = Options(
    combine_allah=True,      # collapse الله -> ﷲ
    reverse_word_order=True, # full RTL line (False = shape per word, keep typed order)
    word_joiner="\u00A0",    # separator for naive word-by-word readers
    prevent_word_split=True,
    max_line_chars=18,       # wrap long lines ourselves (first words on top, each line RTL)
)
fix("نص عربي طويل", opts)

Why "un-baking" matters

To make Arabic show up correctly on a client that does no shaping, you "bake" it into final presentation glyphs in visual (reversed) order. The catch: once baked, the text is no longer real Arabic letters — so a text-to-speech engine reads gibberish, and search/logging break. unfix() reverses the bake (presentation forms → base letters, ligatures expanded, order restored) so the display can stay baked while the voice and data see clean Arabic.

API

Function Purpose
fix(text, opts=None, **overrides) Logical Arabic → baked visual presentation forms. No-op on non-Arabic or already-shaped text.
unfix(text) Baked Arabic → logical Arabic. No-op on text that isn't baked.
shape(text, *, combine_allah=False) Contextual shaping only; order preserved.
contains_arabic(text) / is_shaped(text) Fast checks.
Options / GAME Config dataclass and a ready preset for game chat.

Also available for .NET & Unity

The same engine, ported to C# with byte-for-byte identical output, targeting netstandard2.0/2.1 (Unity-compatible): github.com/balswyan/arabic-rt-dotnet · dotnet add package ArabicRt

A note on display fonts

arabic-rt produces correct text; how it looks is your font's job. For rendering shaped Arabic (e.g. in the demo or a UI), a quality Naskh face such as Noto Naskh Arabic or Amiri (both SIL OFL) looks far better than a generic system font.

Validation

Run the suite (installs the reference libraries as dev extras):

pip install -e ".[dev]"
pytest -q

License & author

Licensed under the Mozilla Public License 2.0 (MPL-2.0) — see LICENSE. Use it freely, including in closed-source games and apps; modifications to arabic-rt's own files stay open.

Created by Bandar AlSwyan.


عربي — نظرة سريعة

arabic-rt مكتبة لمعالجة النص العربي: تشكيل الحروف (وصلها بأشكالها الصحيحة)، وترتيبها من اليمين إلى اليسار، والأهم القدرة على عكس العملية — أي تحويل النص «المخبوز» (أشكال العرض المقلوبة) مرة أخرى إلى عربية منطقية سليمة.

هذه القدرة على «فك الخبز» (unfix) هي ما يجعل العربية تعمل في أماكن تتعطّل فيها عادةً: دردشة الألعاب الجماعية، والمحرّكات التي لا تعالج العربية، وأنظمة النطق (TTS). فالنص يظهر صحيحاً للجميع، بينما يقرأ محرّك الصوت أو البحث نسخة منطقية نظيفة.

🤗 جرّب العرض الحيّ: huggingface.co/spaces/balswyan/arabic-rt

  • fix(): عربية منطقية ← أشكال عرض جاهزة تظهر بشكل صحيح على أي عميل حتى بدون معالجة.
  • unfix(): عكس العملية لاستعادة العربية المنطقية (للنطق والبحث والسجلات).
  • GAME: إعداد جاهز لدردشة الألعاب التي تقرأ الكلمات واحدة تلو الأخرى.
  • بدون أي اعتماديات، ومُتحقَّق منها مقابل arabic_reshaper و python-bidi حرفاً بحرف.

متوفّرة أيضاً لـ .NET و Unity: arabic-rt-dotnet. برخصة MPL-2.0. من إعداد بندر الصويان.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arabic_rt-0.1.3.tar.gz (16.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arabic_rt-0.1.3-py3-none-any.whl (15.6 kB view details)

Uploaded Python 3

File details

Details for the file arabic_rt-0.1.3.tar.gz.

File metadata

  • Download URL: arabic_rt-0.1.3.tar.gz
  • Upload date:
  • Size: 16.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arabic_rt-0.1.3.tar.gz
Algorithm Hash digest
SHA256 fb00a6ff5a6ca90f4eedfdf64532a66637c25b429580207e3d8d0b95ba936844
MD5 3e8d7359f13716169c5437ce4b0794ba
BLAKE2b-256 fca32886462cfd9c2485e20897e86b0e518b1da7814daa0c3e3a73d7119214de

See more details on using hashes here.

Provenance

The following attestation bundles were made for arabic_rt-0.1.3.tar.gz:

Publisher: publish.yml on balswyan/arabic-rt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arabic_rt-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: arabic_rt-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 15.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arabic_rt-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 3646af0e1b1b5d8b9e91018dd95466c51abf8eeee39ab19d599af2109540eeb3
MD5 8cdecbd1008c51c1031ee21e8f1bf16b
BLAKE2b-256 57e95cb8c1a4b6230b3854e84139fdc5f937ceeed25c2670cb0aa4d78541b312

See more details on using hashes here.

Provenance

The following attestation bundles were made for arabic_rt-0.1.3-py3-none-any.whl:

Publisher: publish.yml on balswyan/arabic-rt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page