Skip to main content

Uzbek text preprocessing library for converting numbers, dates, times, and currency to words

Project description

UzPreprocessor

Python Version License: MIT PyPI version

UzPreprocessor is a comprehensive Python library for converting numbers, dates, times, and currency amounts to Uzbek (Latin) words. Perfect for legal documents, invoices, receipts, and text preprocessing tasks.

Features

Number Conversion

  • Integers (arbitrary size)
  • Decimal numbers (up to 12 digits precision)
  • Negative numbers
  • Ordinal numbers

💰 Currency Conversion

  • Uzbek so'm and tiyin
  • Automatic handling of decimal places

📅 Date Conversion

  • Multiple input formats (ISO, European, US, text)
  • Supports English and Uzbek month names
  • Legal date format support

Time Conversion

  • 24-hour and 12-hour (AM/PM) formats
  • Spoken Uzbek time periods (ertalab, tushlikdan keyin, kechqurun, etc.)
  • Multiple time formats with flexible parsing

🔗 DateTime Conversion

  • Combined date and time conversion
  • ISO datetime format support

Installation

pip install uzpreprocessor

Quick Start

Basic Usage

from uzpreprocessor import UzPreprocessor

# Initialize the processor
processor = UzPreprocessor()

# Convert numbers
print(processor.number.number(123))
# Output: bir yuz yigirma uch

print(processor.number.number(123.456))
# Output: bir yuz yigirma uch butun to'rt yuz ellik olti mingdan

# Convert currency
print(processor.number.money(12345.67))
# Output: o'n ikki ming uch yuz qirq besh so'm oltmish yetti tiyin

# Convert percentages
print(processor.number.percent(12.345))
# Output: o'n ikki butun uch yuz qirq besh mingdan foiz

# Convert dates
print(processor.date.date("2025-09-18"))
# Output: ikki ming yigirma beshinchi yil o'n sakkizinchi sentabr

# Convert time
print(processor.time.time("14:35:08"))
# Output: o'n to'rt soat o'ttiz besh daqiqa sakkiz soniya

# Convert datetime
print(processor.datetime.datetime("2025-09-18T14:35:08"))
# Output: ikki ming yigirma beshinchi yil o'n sakkizinchi sentabr o'n to'rt soat o'ttiz besh daqiqa sakkiz soniya

Advanced Usage

Direct Class Usage

from uzpreprocessor import UzNumberToWords, UzDateToWords, UzTimeToWords, UzDateAndTimeToWords

# Create converters
number_converter = UzNumberToWords()
date_converter = UzDateToWords(number_converter)
time_converter = UzTimeToWords(number_converter)
datetime_converter = UzDateAndTimeToWords(date_converter, time_converter)

# Use individual converters
print(date_converter.date("18 September 2025"))
# Output: ikki ming yigirma beshinchi yil o'n sakkizinchi sentabr

print(time_converter.time("2 PM"))
# Output: tushlikdan keyin soat o'n to'rt

Detailed Examples

Number Conversion

from uzpreprocessor import UzNumberToWords

conv = UzNumberToWords()

# Integers
print(conv.number(0))          # nol
print(conv.number(5))          # besh
print(conv.number(42))         # qirq ikki
print(conv.number(123))        # bir yuz yigirma uch
print(conv.number(1000000))    # bir million

# Decimals
print(conv.number(123.456))    # bir yuz yigirma uch butun to'rt yuz ellik olti mingdan
print(conv.number(0.5))        # nol butun besh o'ndan

# Negative numbers
print(conv.number(-42))        # minus qirq ikki

# Ordinal numbers
print(conv.ordinal(5))         # beshinchi
print(conv.ordinal(123))       # bir yuz yigirma uchinchi

Currency Conversion

from uzpreprocessor import UzNumberToWords

conv = UzNumberToWords()

print(conv.money(1000))        # bir ming so'm
print(conv.money(12345.67))    # o'n ikki ming uch yuz qirq besh so'm oltmish yetti tiyin
print(conv.money(0.50))        # nol so'm ellik tiyin
print(conv.money(-100))        # minus bir yuz so'm

Date Conversion

The library supports multiple date formats:

from uzpreprocessor import UzPreprocessor

processor = UzPreprocessor()

# ISO format
print(processor.date.date("2025-09-18"))

# European format
print(processor.date.date("18.09.2025"))
print(processor.date.date("18/09/2025"))

# US format
print(processor.date.date("09/18/2025"))

# Text format (English)
print(processor.date.date("18 September 2025"))
print(processor.date.date("September 18, 2025"))

# Text format (Uzbek)
print(processor.date.date("18 sentabr 2025"))

# Legal format
print(processor.date.date("2025-yil 18-sentabr"))

# Python date objects
from datetime import date
print(processor.date.date(date(2025, 9, 18)))

Time Conversion

from uzpreprocessor import UzPreprocessor

processor = UzPreprocessor()

# 24-hour format (formal mode)
print(processor.time.time("14:35"))        # o'n to'rt soat o'ttiz besh daqiqa
print(processor.time.time("14:35:08"))     # o'n to'rt soat o'ttiz besh daqiqa sakkiz soniya
print(processor.time.time("00:00"))        # nol soat

# 12-hour format with AM/PM (spoken mode)
print(processor.time.time("2 PM"))         # tushlikdan keyin soat o'n to'rt
print(processor.time.time("2:35 PM"))      # tushlikdan keyin soat o'n to'rt o'ttiz besh daqiqa
print(processor.time.time("7 AM"))         # ertalab soat yetti

# Various formats
print(processor.time.time("14.35"))        # o'n to'rt soat o'ttiz besh daqiqa
print(processor.time.time("14 35"))        # o'n to'rt soat o'ttiz besh daqiqa
print(processor.time.time("14:35:08Z"))    # o'n to'rt soat o'ttiz besh daqiqa sakkiz soniya

# Python time objects
from datetime import time
print(processor.time.time(time(14, 35, 8)))

Time Periods (for AM/PM format):

  • ertalab - 5:00-10:59
  • tushlikdan oldin - 11:00-12:59
  • tushlikdan keyin - 13:00-17:59
  • kechqurun - 18:00-22:59
  • tun - 23:00-4:59

DateTime Conversion

from uzpreprocessor import UzPreprocessor

processor = UzPreprocessor()

# ISO datetime format
print(processor.datetime.datetime("2025-09-18T14:35:08"))
# Output: ikki ming yigirma beshinchi yil o'n sakkizinchi sentabr o'n to'rt soat o'ttiz besh daqiqa sakkiz soniya

# Python datetime objects
from datetime import datetime
dt = datetime(2025, 9, 18, 14, 35, 8)
print(processor.datetime.datetime(dt))

API Reference

UzPreprocessor

Main convenience class that provides all conversion functionality.

Methods

  • number - Access number converter (UzNumberToWords)
  • date - Access date converter (UzDateToWords)
  • time - Access time converter (UzTimeToWords)
  • datetime - Access datetime converter (UzDateAndTimeToWords)

UzNumberToWords

Converts numbers, currency, and percentages to Uzbek words.

Methods

  • number(value) - Convert number to words
  • money(amount) - Convert currency to words (so'm/tiyin)
  • percent(value) - Convert percentage to words
  • ordinal(value) - Convert number to ordinal form

UzDateToWords

Converts dates to Uzbek words.

Methods

  • date(value) - Convert date to words

Supported input types:

  • String (various formats)
  • datetime.date object
  • datetime.datetime object

UzTimeToWords

Converts time to Uzbek words.

Methods

  • time(value) - Convert time to words

Supported input types:

  • String (various formats)
  • datetime.time object
  • datetime.datetime object

Modes:

  • Formal mode: Standard 24-hour format (e.g., "14:35")
  • Spoken mode: 12-hour format with AM/PM (e.g., "2 PM")

UzDateAndTimeToWords

Combines date and time conversion.

Methods

  • datetime(value) - Convert datetime to words

Supported input types:

  • String (ISO format)
  • datetime.datetime object

Performance

The library is optimized for performance:

  • Compiled regex patterns for faster parsing
  • Efficient string operations with minimal allocations
  • Optimized data structures (tuples for immutable data, dicts for O(1) lookups)
  • No external dependencies (uses only Python standard library)

Requirements

  • Python 3.8 or higher
  • No external dependencies (uses only standard library)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Inspired by the need for Uzbek text preprocessing in legal and financial documents
  • Built with attention to accuracy and performance

Changelog

1.0.0 (2025-01-XX)

  • Initial release
  • Number to words conversion
  • Date to words conversion
  • Time to words conversion
  • Currency conversion
  • Percentage conversion
  • Support for multiple input formats
  • Optimized performance

Documentation

Support

For issues, questions, or contributions, please visit:


Made with ❤️ for the Uzbek developer community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uzpreprocessor-1.0.0.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

uzpreprocessor-1.0.0-py3-none-any.whl (13.6 kB view details)

Uploaded Python 3

File details

Details for the file uzpreprocessor-1.0.0.tar.gz.

File metadata

  • Download URL: uzpreprocessor-1.0.0.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for uzpreprocessor-1.0.0.tar.gz
Algorithm Hash digest
SHA256 a833780a1af55615689bbcc1ab5dd404cf00c198753bcd9c7449ba37264bac72
MD5 31f384ff4f78886f00afe0130e0b4887
BLAKE2b-256 63976c1f439b65488d12cb844e097511ae7ab317fc9e873929396e137ae9d0e2

See more details on using hashes here.

File details

Details for the file uzpreprocessor-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: uzpreprocessor-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 13.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for uzpreprocessor-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bd4e1d5ac2002c1e5a60382beeceb3823fea7070c95909efc49bf31b5e1fcd94
MD5 bcfb247dd53c7fdb064b183c2664aaa2
BLAKE2b-256 650c102fb19e272e55f5af395a54fb4d146cd7908601086b717732895ef598da

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page