ICU toolkit - library and CLI for Unicode and internationalization
Project description
icukit
A comprehensive Python toolkit for Unicode and internationalization, built on ICU (International Components for Unicode).
icukit provides a Pythonic interface to ICU's powerful text processing, localization, and internationalization capabilities. It includes both a library API and a command-line interface.
Installation
pip install icukit
icukit bundles ICU libraries via icukit-pyicu, so no system ICU installation is required.
Features
Text Processing
- Transliteration: Convert between scripts (Latin to Cyrillic, Hangul to Latin, etc.)
- Normalization: NFC, NFD, NFKC, NFKD Unicode normalization forms
- Text Segmentation: Break text into words, sentences, lines, or grapheme clusters
- Unicode Regex: Full Unicode-aware regular expressions with script and property support
Localization
- Number Formatting: Decimal, currency, percent, scientific, spelled-out numbers
- Date/Time Formatting: Locale-aware date and time formatting with multiple styles
- Duration Formatting: Human-readable time durations ("2 hours, 30 minutes")
- List Formatting: Locale-aware list formatting ("A, B, and C")
- Plural Rules: Determine plural categories (one, few, many, other) for any locale
- Message Formatting: ICU MessageFormat for complex localized strings
Internationalization Utilities
- Collation: Locale-aware string sorting and comparison
- Locale Information: Parse, validate, and query locale data
- Script Detection: Identify writing scripts in text
- Bidirectional Text: Detect and handle RTL/LTR text
- IDNA: Internationalized domain name encoding/decoding
- Spoof Detection: Detect confusable characters and homograph attacks
Reference Data
- Regions: Country and region codes with containment relationships
- Scripts: Writing system information and properties
- Timezones: Timezone data with offsets and equivalents
- Calendars: Calendar system information (Gregorian, Hebrew, Islamic, etc.)
Quick Start
Library API
from icukit import (
transliterate,
sort_strings,
format_number,
format_datetime,
get_plural_category,
break_words,
)
# Transliterate text between scripts
transliterate("Привет мир", "Russian-Latin/BGN") # "Privet mir"
transliterate("hello", "Latin-Cyrillic") # "хелло"
# Sort strings with locale-aware collation
sort_strings(["cafe", "café", "CAFE"], "en_US") # ['cafe', 'café', 'CAFE']
sort_strings(["Öl", "Ol", "öl"], "de_DE") # ['Ol', 'Öl', 'öl']
# Format numbers for different locales
format_number(1234567.89, "en_US") # "1,234,567.89"
format_number(1234567.89, "de_DE") # "1.234.567,89"
format_number(1234567.89, "hi_IN") # "12,34,567.89"
# Format dates
from datetime import datetime
now = datetime.now()
format_datetime(now, "en_US", style="LONG") # "January 19, 2026 at 4:00:00 PM PST"
format_datetime(now, "ja_JP", style="LONG") # "2026年1月19日 16:00:00 PST"
# Determine plural category
get_plural_category(1, "en") # "one"
get_plural_category(2, "en") # "other"
get_plural_category(2, "ru") # "few"
get_plural_category(5, "ru") # "many"
# Break text into words
break_words("Hello, world!") # ["Hello", ",", " ", "world", "!"]
Command-Line Interface
icukit includes a full-featured CLI accessible via icukit or ik:
# Transliterate text
ik transliterate "Москва" Russian-Latin/BGN
# Output: Moskva
# Format numbers
ik number 1234567.89 --locale de_DE
# Output: 1.234.567,89
# Get locale information
ik locale info en_US
# List available transliterators
ik transliterate --list
# Sort lines with locale collation
cat names.txt | ik sort --locale sv_SE
# Detect scripts in text
ik script detect "Hello Мир 世界"
# Get Unicode character information
ik unicode info "A"
Run ik help or ik <command> --help for detailed usage information.
Supported Python Versions
- Python 3.9+
- Tested on Linux and macOS
Documentation
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file icukit-0.1.2.tar.gz.
File metadata
- Download URL: icukit-0.1.2.tar.gz
- Upload date:
- Size: 144.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7977556b621e424b93f770171f324253ef253b70af287949696da623267d730f
|
|
| MD5 |
0ee071cee6e0eb3b5e8bfea3c3a84f3d
|
|
| BLAKE2b-256 |
d333a9e1d1f8fb657e54e156b534f0b1e418d76d959411d99f2784c511299ac2
|
File details
Details for the file icukit-0.1.2-py3-none-any.whl.
File metadata
- Download URL: icukit-0.1.2-py3-none-any.whl
- Upload date:
- Size: 141.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d27a1402958bb7d30d6881c671a073a813548c8db4246de55b10dd885ce82c90
|
|
| MD5 |
b37021ac3a09a21a143260c089f90cfc
|
|
| BLAKE2b-256 |
502e3e3b40eaf46779c1e621b61ea0aa1a7dac1352a8f5d6bffb3f8f52cbfd61
|