Skip to main content

Arabic shaping, BiDi, and un-baking for games, TTS, and real-time clients.

Project description

arabic-rt

Arabic shaping, BiDi, and un-baking for games, TTS, and real-time clients.

PyPI Python License: MPL-2.0 Open in Spaces

🤗 Try the live demo · 📦 PyPI · 🎮 .NET / Unity version

Most Arabic libraries can turn logical Arabic into correctly shaped, right-to-left text. arabic-rt does that too — but it also does the part almost nothing else does: it can reverse the process, turning baked presentation-form text back into clean logical Arabic. That round-trip is what makes Arabic work in places it normally breaks: multiplayer game chat, naive text renderers, and text-to-speech.

  • 🔁 Bake and un-bake. fix() → renders correctly even on clients that do zero Arabic processing. unfix() → recovers logical Arabic for TTS, search, or logging.
  • 🎮 Built for real-time clients. A GAME preset handles word-by-word chat readers (joins words so they aren't split, keeps the first words on top when wrapping).
  • 🧩 Zero dependencies. Pure Python. Drop it in anywhere.
  • Validated. Forward output matches arabic_reshaper + python-bidi byte-for-byte; unfix(fix(x)) == x is covered by tests. The C# port produces byte-identical output, so text baked in Unity reads back in Python and vice-versa.

Pure shaping/BiDi is well served by existing tools. arabic-rt's reason to exist is the real-time / game niche and the un-baking capability built for it.

Try it

A live, no-install demo — type Arabic and watch it shaped, baked, and un-baked in real time: https://huggingface.co/spaces/balswyan/arabic-rt

Install

pip install arabic-rt

Quick start

import arabic_rt as ar

baked = ar.fix("مرحبا بالعالم")     # visual-order presentation forms (renders anywhere)
ar.unfix(baked)                      # -> "مرحبا بالعالم"  (back to logical, for TTS/search)
ar.shape("سلم")                      # -> "ﺳﻠﻢ"  (contextual shaping only, no reorder)

ar.contains_arabic("hi مرحبا")       # True
ar.is_shaped(baked)                  # True

Game chat (word-by-word readers)

ar.fix("مرحبا بالعالم", ar.GAME)     # words joined so a naive reader shows the whole phrase

Tune it yourself

from arabic_rt import Options, fix

opts = Options(
    combine_allah=True,      # collapse الله -> ﷲ
    reverse_word_order=True, # full RTL line (False = shape per word, keep typed order)
    word_joiner="\u00A0",    # separator for naive word-by-word readers
    prevent_word_split=True,
    max_line_chars=18,       # wrap long lines ourselves (first words on top, each line RTL)
)
fix("نص عربي طويل", opts)

Why "un-baking" matters

To make Arabic show up correctly on a client that does no shaping, you "bake" it into final presentation glyphs in visual (reversed) order. The catch: once baked, the text is no longer real Arabic letters — so a text-to-speech engine reads gibberish, and search/logging break. unfix() reverses the bake (presentation forms → base letters, ligatures expanded, order restored) so the display can stay baked while the voice and data see clean Arabic.

API

Function Purpose
fix(text, opts=None, **overrides) Logical Arabic → baked visual presentation forms. No-op on non-Arabic or already-shaped text.
unfix(text) Baked Arabic → logical Arabic. No-op on text that isn't baked.
shape(text, *, combine_allah=False) Contextual shaping only; order preserved.
contains_arabic(text) / is_shaped(text) Fast checks.
Options / GAME Config dataclass and a ready preset for game chat.

Also available for .NET & Unity

The same engine, ported to C# with byte-for-byte identical output, targeting netstandard2.0/2.1 (Unity-compatible): github.com/balswyan/arabic-rt-dotnet · dotnet add package ArabicRt

A note on display fonts

arabic-rt produces correct text; how it looks is your font's job. For rendering shaped Arabic (e.g. in the demo or a UI), a quality Naskh face such as Noto Naskh Arabic or Amiri (both SIL OFL) looks far better than a generic system font.

Validation

Run the suite (installs the reference libraries as dev extras):

pip install -e ".[dev]"
pytest -q

License & author

Licensed under the Mozilla Public License 2.0 (MPL-2.0) — see LICENSE. Use it freely, including in closed-source games and apps; modifications to arabic-rt's own files stay open.

Created by Bandar AlSwyan.


عربي — نظرة سريعة

arabic-rt مكتبة لمعالجة النص العربي: تشكيل الحروف (وصلها بأشكالها الصحيحة)، وترتيبها من اليمين إلى اليسار، والأهم القدرة على عكس العملية — أي تحويل النص «المخبوز» (أشكال العرض المقلوبة) مرة أخرى إلى عربية منطقية سليمة.

هذه القدرة على «فك الخبز» (unfix) هي ما يجعل العربية تعمل في أماكن تتعطّل فيها عادةً: دردشة الألعاب الجماعية، والمحرّكات التي لا تعالج العربية، وأنظمة النطق (TTS). فالنص يظهر صحيحاً للجميع، بينما يقرأ محرّك الصوت أو البحث نسخة منطقية نظيفة.

🤗 جرّب العرض الحيّ: huggingface.co/spaces/balswyan/arabic-rt

  • fix(): عربية منطقية ← أشكال عرض جاهزة تظهر بشكل صحيح على أي عميل حتى بدون معالجة.
  • unfix(): عكس العملية لاستعادة العربية المنطقية (للنطق والبحث والسجلات).
  • GAME: إعداد جاهز لدردشة الألعاب التي تقرأ الكلمات واحدة تلو الأخرى.
  • بدون أي اعتماديات، ومُتحقَّق منها مقابل arabic_reshaper و python-bidi حرفاً بحرف.

متوفّرة أيضاً لـ .NET و Unity: arabic-rt-dotnet. برخصة MPL-2.0. من إعداد بندر الصويان.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arabic_rt-0.1.4.tar.gz (17.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arabic_rt-0.1.4-py3-none-any.whl (16.1 kB view details)

Uploaded Python 3

File details

Details for the file arabic_rt-0.1.4.tar.gz.

File metadata

  • Download URL: arabic_rt-0.1.4.tar.gz
  • Upload date:
  • Size: 17.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arabic_rt-0.1.4.tar.gz
Algorithm Hash digest
SHA256 c47dbbb7725bf6026518734570459097fdb0eabdd1a2b7f6e59eb2866b5d2662
MD5 603b9a6d116f911cbb3dffcc5e1b46a2
BLAKE2b-256 43502f0b9eaae31a97ac3af161932e110b6bf8928231d62314e7f80a34a7977e

See more details on using hashes here.

Provenance

The following attestation bundles were made for arabic_rt-0.1.4.tar.gz:

Publisher: publish.yml on balswyan/arabic-rt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arabic_rt-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: arabic_rt-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 16.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arabic_rt-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 81d725b60b570a33795521686fb05a8cfedecbdbbec66021f6c930e246b3e24e
MD5 3a3b856b8ed2e73749dcc172c5cf810b
BLAKE2b-256 9d77f53a182917f2521a1a43b6d87af9b176d79ac1362c4c7ba105ed8c7cfea0

See more details on using hashes here.

Provenance

The following attestation bundles were made for arabic_rt-0.1.4-py3-none-any.whl:

Publisher: publish.yml on balswyan/arabic-rt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page