A Python library for normalizing Dhivehi text and converting numbers to Dhivehi text format, supporting written, spoken and year forms
Project description
dv-normalize
A Python library for normalizing Dhivehi text by converting numbers to Dhivehi and standardizing sentence endings.
Features
- Converts numbers to Dhivehi text (both written and spoken forms)
- Handles years (when followed by ވަނަ)
- Handles decimal numbers
- Normalizes formal sentence endings to colloquial form
- Preserves proper spacing and punctuation
Installation
pip install dv-normalize
Usage
There are two main functions in this library:
int_to_dv- This function converts numbers to Dhivehi text in written form.spoken_dv- This function converts dhivehi text to spoken form.
Written form
## test case for int_to_dv
from dv_normalize.dv_num import int_to_dv
def main():
while True:
try:
num = input("Enter a number (0 to trillion) or 'q' to exit: ")
if num.lower() == 'q':
break
num = int(num)
if num < 0:
print("Please enter a non-negative number")
continue
print(f"{num:,} in Dhivehi:")
written = int_to_dv(num, is_spoken=False)
spoken = int_to_dv(num, is_spoken=True)
year = "Not a valid year format" if num < 1000 or num > 9999 else int_to_dv(num, is_year=True)
print(f"Written form: {written}")
print(f"Spoken form: {spoken}")
print(f"Year form: {year}")
except ValueError:
print("Please enter a valid number")
if __name__ == "__main__":
main()
Spoken form
from dv_normalize.dv_sentence import spoken_dv
# Test cases
test_cases = [
"މިއަދު ވަރަށް ފިނިވެއެވެ.", # Verb ending
"މިއީ ރީތި ފޮތެކެވެ.", # Noun ending
"އޭނާ ދަނީ ސްކޫލަށެވެ.", # Direction ending
"1955 މީހުން ތިބެއެވެ.", # Number with ending
"2024 ވަނަ އަހަރު", # Year
"12.5 ރުފިޔާ", # Decimal
"1000 މީހުން", # Regular number
"މިއީ ރީތި ފޮތެކެވެ.", # Sentence ending
"އޭނާ ގެއަށެވެ.", # Sentence ending
"ހާއްސަ އެއްބަސްވުމުގެ ދަށުން އިންޑިއާއިން ރާއްޖެއަށް ވިއްކާ ހަކުރު އޮޅުވާލައިގެން ލަންކާއަށް!", # test sentence
"އެ އިދާރާއިން ބަލަމުން އަންނަނީ މިދިޔަ މަހުގެ 25 ގައި އެގައުމުން ބޭރު ކުރި 64 ހާސް ޓަނުގެ ހަކުރުގެ ޝިޕްމެންޓެއްގެ މައްސަލަ އެވެ. އެ ޝިޕްމެންޓް އެގައުމުން ބޭރުކުރީ ރާއްޖެ އާއި އިންޑިއާ އާ ދެމެދު ވެފައިވާ ވިޔަފާރީގެ ހާއްސަ އެއްބަސްވުމުގެ ދަށުން ކަނޑައަޅާފައިވާ އަގުތަކުގައި ނަމަވެސް، އެއިން ބައެއް ލަންކާއަށް އެތެރެކުރިން ފަޅާއަރާފައިވާ ކަމަށް އިންޑިއާގެ ބައެއް ނޫސްތަކުގައި ރިފޯޓުކޮށްފައިވެ އެވެ." # test long sentence
]
for test in test_cases:
print(f"Original: {test}")
print(f"Normalized: {spoken_dv(test)}\n")
License
This project is licensed under the MIT License. See the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dv-normalizer-0.1.0.tar.gz.
File metadata
- Download URL: dv-normalizer-0.1.0.tar.gz
- Upload date:
- Size: 6.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a37d49c0e2349d50e8e5b0b9b27b978ee37f5962d43ff2e83ab9da0598fcdab4
|
|
| MD5 |
20c57895d114427c34578b52da0e5d0a
|
|
| BLAKE2b-256 |
d82a3b4a1e2f1545a98d011f6fba98c3c1d0240a6ec1eaee46b22dc194bd1493
|
File details
Details for the file dv_normalizer-0.1.0-py3-none-any.whl.
File metadata
- Download URL: dv_normalizer-0.1.0-py3-none-any.whl
- Upload date:
- Size: 6.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
89967ad23b9803b910ae49c271d3321246c1cc0c5e340e8ee500f8be4d2c0c57
|
|
| MD5 |
b94380dea14dcfeae0d3216cafaefa2d
|
|
| BLAKE2b-256 |
ea7e3a0462c3ea7c024f17325142e6ceb805be399653ebf55d88b57023c9d3c7
|