Skip to main content

ICU toolkit - library and CLI for Unicode and internationalization

Project description

icukit

A comprehensive Python toolkit for Unicode and internationalization, built on ICU (International Components for Unicode).

icukit provides a Pythonic interface to ICU's powerful text processing, localization, and internationalization capabilities. It includes both a library API and a command-line interface.

Installation

pip install icukit

icukit bundles ICU libraries via icukit-pyicu, so no system ICU installation is required.

Features

Text Processing

  • Transliteration: Convert between scripts (Latin to Cyrillic, Hangul to Latin, etc.)
  • Normalization: NFC, NFD, NFKC, NFKD Unicode normalization forms
  • Text Segmentation: Break text into words, sentences, lines, or grapheme clusters
  • Unicode Regex: Full Unicode-aware regular expressions with script and property support

Localization

  • Number Formatting: Decimal, currency, percent, scientific, spelled-out numbers
  • Date/Time Formatting: Locale-aware date and time formatting with multiple styles
  • Duration Formatting: Human-readable time durations ("2 hours, 30 minutes")
  • List Formatting: Locale-aware list formatting ("A, B, and C")
  • Plural Rules: Determine plural categories (one, few, many, other) for any locale
  • Message Formatting: ICU MessageFormat for complex localized strings

Internationalization Utilities

  • Collation: Locale-aware string sorting and comparison
  • Locale Information: Parse, validate, and query locale data
  • Script Detection: Identify writing scripts in text
  • Bidirectional Text: Detect and handle RTL/LTR text
  • IDNA: Internationalized domain name encoding/decoding
  • Spoof Detection: Detect confusable characters and homograph attacks

Reference Data

  • Regions: Country and region codes with containment relationships
  • Scripts: Writing system information and properties
  • Timezones: Timezone data with offsets and equivalents
  • Calendars: Calendar system information (Gregorian, Hebrew, Islamic, etc.)

Quick Start

Library API

from icukit import (
    transliterate,
    sort_strings,
    format_number,
    format_datetime,
    get_plural_category,
    break_words,
)

# Transliterate text between scripts
transliterate("Привет мир", "Russian-Latin/BGN")  # "Privet mir"
transliterate("hello", "Latin-Cyrillic")  # "хелло"

# Sort strings with locale-aware collation
sort_strings(["cafe", "café", "CAFE"], "en_US")  # ['cafe', 'café', 'CAFE']
sort_strings(["Öl", "Ol", "öl"], "de_DE")  # ['Ol', 'Öl', 'öl']

# Format numbers for different locales
format_number(1234567.89, "en_US")  # "1,234,567.89"
format_number(1234567.89, "de_DE")  # "1.234.567,89"
format_number(1234567.89, "hi_IN")  # "12,34,567.89"

# Format dates
from datetime import datetime
now = datetime.now()
format_datetime(now, "en_US", style="LONG")  # "January 19, 2026 at 4:00:00 PM PST"
format_datetime(now, "ja_JP", style="LONG")  # "2026年1月19日 16:00:00 PST"

# Determine plural category
get_plural_category(1, "en")  # "one"
get_plural_category(2, "en")  # "other"
get_plural_category(2, "ru")  # "few"
get_plural_category(5, "ru")  # "many"

# Break text into words
break_words("Hello, world!")  # ["Hello", ",", " ", "world", "!"]

Command-Line Interface

icukit includes a full-featured CLI accessible via icukit or ik:

# Transliterate text
ik transliterate "Москва" Russian-Latin/BGN
# Output: Moskva

# Format numbers
ik number 1234567.89 --locale de_DE
# Output: 1.234.567,89

# Get locale information
ik locale info en_US

# List available transliterators
ik transliterate --list

# Sort lines with locale collation
cat names.txt | ik sort --locale sv_SE

# Detect scripts in text
ik script detect "Hello Мир 世界"

# Get Unicode character information
ik unicode info "A"

Run ik help or ik <command> --help for detailed usage information.

Supported Python Versions

  • Python 3.9+
  • Tested on Linux and macOS

Documentation

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

icukit-0.1.2.tar.gz (144.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

icukit-0.1.2-py3-none-any.whl (141.0 kB view details)

Uploaded Python 3

File details

Details for the file icukit-0.1.2.tar.gz.

File metadata

  • Download URL: icukit-0.1.2.tar.gz
  • Upload date:
  • Size: 144.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for icukit-0.1.2.tar.gz
Algorithm Hash digest
SHA256 7977556b621e424b93f770171f324253ef253b70af287949696da623267d730f
MD5 0ee071cee6e0eb3b5e8bfea3c3a84f3d
BLAKE2b-256 d333a9e1d1f8fb657e54e156b534f0b1e418d76d959411d99f2784c511299ac2

See more details on using hashes here.

File details

Details for the file icukit-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: icukit-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 141.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for icukit-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d27a1402958bb7d30d6881c671a073a813548c8db4246de55b10dd885ce82c90
MD5 b37021ac3a09a21a143260c089f90cfc
BLAKE2b-256 502e3e3b40eaf46779c1e621b61ea0aa1a7dac1352a8f5d6bffb3f8f52cbfd61

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page