Skip to main content

Python utilities for Adyghe (Western Circassian) language: Cyrillic↔Latin alphabet conversion and number-to-words conversion

Project description

adyghe-latin-utils

PyPI version Python License: MIT

🌐 English | Türkçe | Русский | עברית

Python utilities for the Adyghe (Western Circassian) language — Cyrillic↔Latin alphabet conversion and number-to-words conversion.

About Adyghe

Adyghe (адыгабзэ / adıǵabze) is a Northwest Caucasian language spoken by approximately 600,000 people, primarily in the Republic of Adygea (Russia), Turkey, Jordan, Syria, and diaspora communities worldwide. Its ISO 639-3 language code is ady.

Adyghe is traditionally written in the Cyrillic script (since 1938). A Latin-based Adyghe alphabet also exists as an official writing system. This package provides tools for converting between these two official alphabets, as well as converting numbers into Adyghe words.

Features

  • Cyrillic → Latin conversion — context-aware conversion between the official Cyrillic and Latin Adyghe alphabets, handling compound characters (гу, гъ, дж, дз, жь, кӀ, ку, шъ, etc.)
  • Latin → Cyrillic conversion — reverse conversion with vowel insertion and palochka (Ӏ) rules
  • Number to words — converts integers (0 to 10¹⁵) into Adyghe words using the modern decimal (base-10) system
  • Numbers in text — detects and converts 12 types of numeric patterns in mixed text: phone numbers, currencies ($), percentages (%), ranges (7-12), decimals (5.11), Roman numerals (IV), signed numbers (+14, -32), and more
  • Case utilities — uppercase, lowercase, and capitalize with proper handling of special Latin characters (İ/ı) and Cyrillic palochka (Ӏ)
  • Script detection — detect whether text is written in Cyrillic Adyghe
  • CLI tools — command-line utilities for batch file conversion with multiprocessing support

Installation

pip install adyghe-latin-utils

Or with uv:

uv add adyghe-latin-utils

Quick Start

Alphabet Conversion

from adyghe_latin_utils import AdigaCharacterUtils

utils = AdigaCharacterUtils()

# Cyrillic to Latin
utils.cyrillic_to_latin("гупшысэ")        # → "ǵupşıśé"
utils.cyrillic_to_latin("лъэхъаным")      # → "ĺéḣáním"
utils.cyrillic_to_latin("къещхы")          # → "kéşḣı"

# Latin to Cyrillic
utils.latin_to_cyrillic("selam")           # → "сэлам"
utils.latin_to_cyrillic("adıǵe")          # → "адыгэ"

# Script detection
utils.is_cyrillic_adyghe("гупшысэ")       # → True
utils.is_cyrillic_adyghe("ǵupşıśé")      # → False

Number to Words

This library uses the modern decimal (base-10) system for number-to-words conversion. Adyghe traditionally uses a vigesimal (base-20) counting system similar to French (e.g., French soixante-douze = 60 + 12 for "72"). In the traditional Adyghe system, "72" is ṫoćişıre ṫure (roughly "three-twenties-and-twelve"). Modern usage has shifted towards a simpler decimal (base-10) system:

Number Modern decimal (this library) Traditional vigesimal (not supported)
72 blıć ṫu (7-tens and 2) ṫoćişıre ṫure (3×20 + 12)
from adyghe_latin_utils import AdigaNumberUtils

AdigaNumberUtils.number_to_words(5)        # → "tfı"
AdigaNumberUtils.number_to_words(42)       # → "pĺ'ıć ṫu"
AdigaNumberUtils.number_to_words(100)      # → "şe"
AdigaNumberUtils.number_to_words(1000)     # → "min"
AdigaNumberUtils.number_to_words(2025)     # → "ṫu min ṫuć tfı"

Numbers in Mixed Text

from adyghe_latin_utils import AdigaNumberUtils

AdigaNumberUtils.convert_numbers_in_text("chapter 3")
# → "chapter şı"

AdigaNumberUtils.convert_numbers_in_text("agent 007")
# → "agent ziy ziy blı"

AdigaNumberUtils.convert_numbers_in_text("the year 2025")
# → "the year ṫu min ṫuć tfı"

Case Utilities

from adyghe_latin_utils import AdigaCharacterUtils

utils = AdigaCharacterUtils()

# Latin text
utils.to_lowercase("ADIGE", is_latin=True, is_cyrillic=False)
utils.to_uppercase("adıǵe", is_latin=True, is_cyrillic=False)
utils.capitalize("adıǵe", is_latin=True, is_cyrillic=False)

# Simplify special Latin chars to basic English
utils.special_chars_to_english_chars("ǵupşıśé")  # → "gupsise"

CLI Usage

Two command-line tools are installed with the package:

Script Conversion

# Cyrillic to Latin (file to file)
adyghe-char-convert -i input.txt -o output.txt -d c2l

# Latin to Cyrillic (file to stdout)
adyghe-char-convert -i input.txt -d l2c

# Convert a string passed directly on the command line
adyghe-char-convert -t "гупшысэ" -d c2l

# Options:
#   -t, --text       Input text string (mutually exclusive with -i)
#   -i, --input      Path to input text file (mutually exclusive with -t)
#   -o, --output     Path to output file (default: stdout)
#   -d, --direction  c2l (Cyrillic→Latin) or l2c (Latin→Cyrillic) (required)

The script conversion CLI supports multiprocessing for large files and displays a progress bar.

Number Conversion

# Convert numbers in a text string
adyghe-num-convert -t "chapter 3"

# Convert numbers in a file
adyghe-num-convert -i input.txt -o output.txt

# Options:
#   -t, --text    Input text string (mutually exclusive with -i)
#   -i, --input   Path to input text file (mutually exclusive with -t)
#   -o, --output  Path to output file (default: stdout)

API Reference

AdigaCharacterUtils

Method Description
cyrillic_to_latin(text: str) -> str Convert Cyrillic Adyghe text to Latin script
latin_to_cyrillic(text: str) -> str Convert Latin Adyghe text to Cyrillic script
is_cyrillic_adyghe(text: str, threshold: float = 0.5) -> bool Detect if text is Cyrillic Adyghe
to_lowercase(text, is_latin, is_cyrillic) -> str Lowercase with script-aware rules
to_uppercase(text, is_latin, is_cyrillic) -> str Uppercase with script-aware rules
capitalize(text, is_latin, is_cyrillic) -> str Capitalize first character
special_chars_to_english_chars(text: str) -> str Simplify accented Latin chars to ASCII
cyrillic_extra_chars_to_basic_chars(text: str) -> str Normalize Cyrillic character variants
sanitize_latin_text(text: str) -> str Strip characters outside the Latin Adyghe alphabet, collapse whitespace, and normalize stray punctuation

AdigaNumberUtils

Method Description
number_to_words(number: int) -> str Convert integer (0–10¹⁵) to Adyghe words
convert_numbers_in_text(text: str) -> str Find and convert all numeric patterns in text

Supported Numeric Patterns

convert_numbers_in_text() recognizes and converts these patterns:

Pattern Example Description
International phone +972-58-206-2315 Digits read individually
Local phone 058-206-2315 Digits read individually
Prefix-dash ya-20 Number converted, prefix preserved
Postfix-dash 13-re Number converted, postfix preserved
Dollar amount $16,918 Full number conversion
Signed number +14, -32 Sign preserved, number converted
Range 1042-1814 Each number converted separately
Decimal 5.11, 50.06% Integer and fractional parts converted
Slash-separated 2010/11 Each part converted
Symbol postfix 4%, 804+ Number converted, symbol preserved
Roman numeral III, IV Converted to Arabic then to words
Plain number 42, 1,000,000 Full number conversion

Stability

This project follows Semantic Versioning. From 1.0.0 onward, the following are considered the public, stable API:

  • The AdigaCharacterUtils and AdigaNumberUtils classes re-exported from the adyghe_latin_utils package (see __all__ in src/adyghe_latin_utils/__init__.py).
  • The adyghe-char-convert and adyghe-num-convert command-line tools and their documented flags.

Breaking changes to any of the above will require a major version bump. Anything not listed here (internal modules, helper functions, private attributes prefixed with _, and exact conversion output for previously unhandled edge cases) is considered internal and may change in a minor or patch release. Known lossy conversions between the Cyrillic and Latin alphabets are documented in LIMITATIONS.md.

Development

# Clone the repository
git clone https://github.com/showgan/adyghe-latin-utils.git
cd adyghe-latin-utils

# Create and activate a virtual environment
uv venv
source .venv/bin/activate        # bash/zsh
# source .venv/bin/activate.csh  # tcsh

# Install the package in editable mode with dev dependencies
uv pip install -e ".[dev]"

# Run tests
pytest tests/ -v

License

This project is licensed under the MIT License — see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adyghe_latin_utils-1.0.0.tar.gz (8.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

adyghe_latin_utils-1.0.0-py3-none-any.whl (20.7 kB view details)

Uploaded Python 3

File details

Details for the file adyghe_latin_utils-1.0.0.tar.gz.

File metadata

  • Download URL: adyghe_latin_utils-1.0.0.tar.gz
  • Upload date:
  • Size: 8.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for adyghe_latin_utils-1.0.0.tar.gz
Algorithm Hash digest
SHA256 e2f1f6645078e5cc8139ad3ca72f7b0e148d7c4be79302396d7a546ba0f86c15
MD5 ae82e369dba6ecb28ee3fb7936273091
BLAKE2b-256 8b4f8336993106f77b8f2fda18d9a1ddf2b28430da9d4b14fb36a6969212ffb3

See more details on using hashes here.

File details

Details for the file adyghe_latin_utils-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: adyghe_latin_utils-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 20.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for adyghe_latin_utils-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f2e4610eff644463db17104106a83109dbac26bcb2d7b54b8d2f296ff604a0dd
MD5 9f342ef5ba11081494ce681b22042431
BLAKE2b-256 e79cc94c62fd2fcb214e93a1dccb20a6ccdc3ac2e64f9764dce4159f01add77a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page