Skip to main content

Karakalpak language toolkit for Python — Latin/Cyrillic script conversion, number-to-words, and string utilities

Project description

Kaalin

PyPI version Python License: MIT

A Python toolkit for the Karakalpak language: Latin-Cyrillic script conversion, number-to-words, and locale-aware string operations. Zero dependencies.

Quick Start

pip install kaalin
from kaalin.converter import latin2cyrillic, cyrillic2latin

print(latin2cyrillic("Assalawma áleykum"))  # Ассалаўма әлейкум
print(cyrillic2latin("Ассалаўма әлейкум"))  # Assalawma áleykum

Supported Features

Feature Description
Script Conversion Bidirectional Latin ↔ Cyrillic conversion with multi-character mapping (shш, chч) and special Cyrillic rules (ьиyi, ьоyo, ъеye)
Number to Words Converts integers and floats to Karakalpak words in Latin or Cyrillic script. Supports range 0 to 10³⁰, negative numbers, and decimal fractions
Word Syllabification Splits Karakalpak words into syllables, works with both Latin and Cyrillic scripts, preserves letter case, and recognises digraphs like sh, ch, yu, ya, aw, ew
String Utilities Karakalpak-aware upper() / lower() that correctly handle the dotless ıÍ character pair
CLI Tools cyr2lat and lat2cyr commands for converting text files from the terminal

API Reference

Script Conversion

from kaalin.converter import latin2cyrillic, cyrillic2latin

latin2cyrillic("Qaraqalpaqstan")    # Қарақалпақстан
cyrillic2latin("Қарақалпақстан")    # Qaraqalpaqstan

Both functions accept a str and return a str. The converter handles uppercase, lowercase, and mixed-case text.

Number to Words

from kaalin.number import to_word, NumberRangeError

to_word(123)                     # bir júz jigirma úsh
to_word(999, num_type="cyr")     # тоғыз жүз тоқсан тоғыз
to_word(12.75)                   # on eki pútin júzden jetpis bes
to_word(-42)                     # minus qırıq eki

Parameters:

  • number (int | float) — the number to convert
  • num_type (str) — output script: "lat" (default) or "cyr"

Raises: NumberRangeError if number exceeds 10³⁰.

Word Syllabification

from kaalin.syllable import syllabify

syllabify("qaraqalpaqstan")   # ['qa', 'ra', 'qal', 'paq', 'stan']
syllabify("kompyuter")        # ['kom', 'pyu', 'ter']
syllabify("Шарапат")          # ['Ша', 'ра', 'пат']
syllabify("Adam")             # ['A', 'dam']

"-".join(syllabify("úydegiler"))   # 'úy-de-gi-ler'

Parameters:

  • word (str) — the word to split. Accepts Latin or Cyrillic input.

Returns: A list[str] of syllables in the same script as the input. Words with fewer than two vowels are returned as a single-element list unchanged.

Raises: TypeError if word is not a string.

String Utilities

from kaalin.string import upper, lower

upper("Assalawma áleykum")   # ASSALAWMA ÁLEYKUM
lower("ASSALAWMA ÁLEYKUM")   # assalawma áleykum

Python's built-in str.upper() / str.lower() does not handle the Karakalpak dotless ı correctly. These functions fix that.

CLI Usage

Convert text files between scripts directly from the terminal:

# Cyrillic → Latin
cyr2lat input.txt              # writes input-lat.txt
cyr2lat input.txt output.txt   # writes output.txt

# Latin → Cyrillic
lat2cyr input.txt              # writes input-cyr.txt
lat2cyr input.txt output.txt   # writes output.txt

When to Use Kaalin

  • Converting Karakalpak text between Latin and Cyrillic scripts
  • Displaying numbers as Karakalpak words (invoices, checks, education)
  • Splitting words into syllables for hyphenation, typesetting, or language learning
  • NLP preprocessing for Karakalpak text (script normalization)
  • Building Karakalpak-language applications that need locale-aware string operations
  • Batch-converting text files via CLI

When NOT to Use Kaalin

  • Not a translator — it converts scripts (Latin ↔ Cyrillic), not languages
  • Not a spell-checker — it does not validate or correct Karakalpak text
  • Not for other Turkic languages — Kazakh, Uzbek, Turkish, etc. have different alphabets and rules
  • Not an OCR tool — it works with digital text, not images

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kaalin-3.3.3b1.tar.gz (15.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kaalin-3.3.3b1-py3-none-any.whl (13.2 kB view details)

Uploaded Python 3

File details

Details for the file kaalin-3.3.3b1.tar.gz.

File metadata

  • Download URL: kaalin-3.3.3b1.tar.gz
  • Upload date:
  • Size: 15.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for kaalin-3.3.3b1.tar.gz
Algorithm Hash digest
SHA256 e66e74510d84e7e411eab8498542da7e1565bef6b3a33ca53037a0b0b64f18bc
MD5 ab16a82860f66a9ebf06f90452688c0f
BLAKE2b-256 b38da60334fa24ff1e959c0a4bacb6ce990c42d59aaeb85a85eda9249468e84d

See more details on using hashes here.

File details

Details for the file kaalin-3.3.3b1-py3-none-any.whl.

File metadata

  • Download URL: kaalin-3.3.3b1-py3-none-any.whl
  • Upload date:
  • Size: 13.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for kaalin-3.3.3b1-py3-none-any.whl
Algorithm Hash digest
SHA256 038711a962dec849a0bc6010659b067f3ae9fb1a1f11a6a4ecfb978626c07c1c
MD5 4a9dee9c1b6e72835dda2e59812176ef
BLAKE2b-256 c37618b78828c0191598a92a75de45862464865127eca9ad92c3ee42c6d0ea78

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page