Uzbek text preprocessing library for converting numbers, dates, times, and currency to words
Project description
UzPreprocessor
UzPreprocessor is a comprehensive Python library for converting numbers, dates, times, and currency amounts to Uzbek (Latin) words. Perfect for legal documents, invoices, receipts, and text preprocessing tasks.
Features
✨ Number Conversion
- Integers (arbitrary size)
- Decimal numbers (up to 12 digits precision)
- Negative numbers
- Ordinal numbers
💰 Currency Conversion
- Uzbek so'm and tiyin
- Automatic handling of decimal places
📅 Date Conversion
- Multiple input formats (ISO, European, US, text)
- Supports English and Uzbek month names
- Legal date format support
⏰ Time Conversion
- 24-hour and 12-hour (AM/PM) formats
- Spoken Uzbek time periods (ertalab, tushlikdan keyin, kechqurun, etc.)
- Multiple time formats with flexible parsing
🔗 DateTime Conversion
- Combined date and time conversion
- ISO datetime format support
Installation
pip install uzpreprocessor
Quick Start
Basic Usage
from uzpreprocessor import UzPreprocessor
# Initialize the processor
processor = UzPreprocessor()
# Convert numbers
print(processor.number.number(123))
# Output: bir yuz yigirma uch
print(processor.number.number(123.456))
# Output: bir yuz yigirma uch butun to'rt yuz ellik olti mingdan
# Convert currency
print(processor.number.money(12345.67))
# Output: o'n ikki ming uch yuz qirq besh so'm oltmish yetti tiyin
# Convert percentages
print(processor.number.percent(12.345))
# Output: o'n ikki butun uch yuz qirq besh mingdan foiz
# Convert dates
print(processor.date.date("2025-09-18"))
# Output: ikki ming yigirma beshinchi yil o'n sakkizinchi sentabr
# Convert time
print(processor.time.time("14:35:08"))
# Output: o'n to'rt soat o'ttiz besh daqiqa sakkiz soniya
# Convert datetime
print(processor.datetime.datetime("2025-09-18T14:35:08"))
# Output: ikki ming yigirma beshinchi yil o'n sakkizinchi sentabr o'n to'rt soat o'ttiz besh daqiqa sakkiz soniya
Advanced Usage
Direct Class Usage
from uzpreprocessor import UzNumberToWords, UzDateToWords, UzTimeToWords, UzDateAndTimeToWords
# Create converters
number_converter = UzNumberToWords()
date_converter = UzDateToWords(number_converter)
time_converter = UzTimeToWords(number_converter)
datetime_converter = UzDateAndTimeToWords(date_converter, time_converter)
# Use individual converters
print(date_converter.date("18 September 2025"))
# Output: ikki ming yigirma beshinchi yil o'n sakkizinchi sentabr
print(time_converter.time("2 PM"))
# Output: tushlikdan keyin soat o'n to'rt
Detailed Examples
Number Conversion
from uzpreprocessor import UzNumberToWords
conv = UzNumberToWords()
# Integers
print(conv.number(0)) # nol
print(conv.number(5)) # besh
print(conv.number(42)) # qirq ikki
print(conv.number(123)) # bir yuz yigirma uch
print(conv.number(1000000)) # bir million
# Decimals
print(conv.number(123.456)) # bir yuz yigirma uch butun to'rt yuz ellik olti mingdan
print(conv.number(0.5)) # nol butun besh o'ndan
# Negative numbers
print(conv.number(-42)) # minus qirq ikki
# Ordinal numbers
print(conv.ordinal(5)) # beshinchi
print(conv.ordinal(123)) # bir yuz yigirma uchinchi
Currency Conversion
from uzpreprocessor import UzNumberToWords
conv = UzNumberToWords()
print(conv.money(1000)) # bir ming so'm
print(conv.money(12345.67)) # o'n ikki ming uch yuz qirq besh so'm oltmish yetti tiyin
print(conv.money(0.50)) # nol so'm ellik tiyin
print(conv.money(-100)) # minus bir yuz so'm
Date Conversion
The library supports multiple date formats:
from uzpreprocessor import UzPreprocessor
processor = UzPreprocessor()
# ISO format
print(processor.date.date("2025-09-18"))
# European format
print(processor.date.date("18.09.2025"))
print(processor.date.date("18/09/2025"))
# US format
print(processor.date.date("09/18/2025"))
# Text format (English)
print(processor.date.date("18 September 2025"))
print(processor.date.date("September 18, 2025"))
# Text format (Uzbek)
print(processor.date.date("18 sentabr 2025"))
# Legal format
print(processor.date.date("2025-yil 18-sentabr"))
# Python date objects
from datetime import date
print(processor.date.date(date(2025, 9, 18)))
Time Conversion
from uzpreprocessor import UzPreprocessor
processor = UzPreprocessor()
# 24-hour format (formal mode)
print(processor.time.time("14:35")) # o'n to'rt soat o'ttiz besh daqiqa
print(processor.time.time("14:35:08")) # o'n to'rt soat o'ttiz besh daqiqa sakkiz soniya
print(processor.time.time("00:00")) # nol soat
# 12-hour format with AM/PM (spoken mode)
print(processor.time.time("2 PM")) # tushlikdan keyin soat o'n to'rt
print(processor.time.time("2:35 PM")) # tushlikdan keyin soat o'n to'rt o'ttiz besh daqiqa
print(processor.time.time("7 AM")) # ertalab soat yetti
# Various formats
print(processor.time.time("14.35")) # o'n to'rt soat o'ttiz besh daqiqa
print(processor.time.time("14 35")) # o'n to'rt soat o'ttiz besh daqiqa
print(processor.time.time("14:35:08Z")) # o'n to'rt soat o'ttiz besh daqiqa sakkiz soniya
# Python time objects
from datetime import time
print(processor.time.time(time(14, 35, 8)))
Time Periods (for AM/PM format):
ertalab- 5:00-10:59tushlikdan oldin- 11:00-12:59tushlikdan keyin- 13:00-17:59kechqurun- 18:00-22:59tun- 23:00-4:59
DateTime Conversion
from uzpreprocessor import UzPreprocessor
processor = UzPreprocessor()
# ISO datetime format
print(processor.datetime.datetime("2025-09-18T14:35:08"))
# Output: ikki ming yigirma beshinchi yil o'n sakkizinchi sentabr o'n to'rt soat o'ttiz besh daqiqa sakkiz soniya
# Python datetime objects
from datetime import datetime
dt = datetime(2025, 9, 18, 14, 35, 8)
print(processor.datetime.datetime(dt))
API Reference
UzPreprocessor
Main convenience class that provides all conversion functionality.
Methods
number- Access number converter (UzNumberToWords)date- Access date converter (UzDateToWords)time- Access time converter (UzTimeToWords)datetime- Access datetime converter (UzDateAndTimeToWords)
UzNumberToWords
Converts numbers, currency, and percentages to Uzbek words.
Methods
number(value)- Convert number to wordsmoney(amount)- Convert currency to words (so'm/tiyin)percent(value)- Convert percentage to wordsordinal(value)- Convert number to ordinal form
UzDateToWords
Converts dates to Uzbek words.
Methods
date(value)- Convert date to words
Supported input types:
- String (various formats)
datetime.dateobjectdatetime.datetimeobject
UzTimeToWords
Converts time to Uzbek words.
Methods
time(value)- Convert time to words
Supported input types:
- String (various formats)
datetime.timeobjectdatetime.datetimeobject
Modes:
- Formal mode: Standard 24-hour format (e.g., "14:35")
- Spoken mode: 12-hour format with AM/PM (e.g., "2 PM")
UzDateAndTimeToWords
Combines date and time conversion.
Methods
datetime(value)- Convert datetime to words
Supported input types:
- String (ISO format)
datetime.datetimeobject
Performance
The library is optimized for performance:
- Compiled regex patterns for faster parsing
- Efficient string operations with minimal allocations
- Optimized data structures (tuples for immutable data, dicts for O(1) lookups)
- No external dependencies (uses only Python standard library)
Requirements
- Python 3.8 or higher
- No external dependencies (uses only standard library)
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Inspired by the need for Uzbek text preprocessing in legal and financial documents
- Built with attention to accuracy and performance
Changelog
1.0.0 (2025-01-XX)
- Initial release
- Number to words conversion
- Date to words conversion
- Time to words conversion
- Currency conversion
- Percentage conversion
- Support for multiple input formats
- Optimized performance
Documentation
- Installation Guide - Detailed installation instructions
- Deployment Guide - Complete guide for publishing to PyPI
- Quick Deploy - Quick reference for deployment
- Project Structure - Project organization
- Optimizations - Performance optimizations
Support
For issues, questions, or contributions, please visit:
Made with ❤️ for the Uzbek developer community
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file uzpreprocessor-1.0.0.tar.gz.
File metadata
- Download URL: uzpreprocessor-1.0.0.tar.gz
- Upload date:
- Size: 15.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a833780a1af55615689bbcc1ab5dd404cf00c198753bcd9c7449ba37264bac72
|
|
| MD5 |
31f384ff4f78886f00afe0130e0b4887
|
|
| BLAKE2b-256 |
63976c1f439b65488d12cb844e097511ae7ab317fc9e873929396e137ae9d0e2
|
File details
Details for the file uzpreprocessor-1.0.0-py3-none-any.whl.
File metadata
- Download URL: uzpreprocessor-1.0.0-py3-none-any.whl
- Upload date:
- Size: 13.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd4e1d5ac2002c1e5a60382beeceb3823fea7070c95909efc49bf31b5e1fcd94
|
|
| MD5 |
bcfb247dd53c7fdb064b183c2664aaa2
|
|
| BLAKE2b-256 |
650c102fb19e272e55f5af395a54fb4d146cd7908601086b717732895ef598da
|