Dhivehi text normalization for TTS frontends
Project description
dv-normalize
A Dhivehi text normalizer for TTS frontends. Converts numbers, dates, times, fractions, scores, abbreviations, money, percentages, and other non-Thaana input into spoken-form Dhivehi.
Status:
0.1.8. The v1 rewrite ships under the existing 0.1.x line. The new public API (normalize,Normalizer,NormalizerConfig) is the supported entry point. The legacy 0.1.x classes are still exported but emit aDeprecationWarningand will be removed in a future major release.
Installation
pip install dv-normalize
Quick start
from dv_normalize import normalize
normalize("ވަކި ލާރިން ވެސް 232.23 ލާރި ހޯދައެވެ")
# 'ވަކި ލާރިން ވެސް ދުއިސައްތަ ތިރީސް ދޭއް ޕޮއިންޓް ދޭއް ތިނެއް ލާރި ހޯދައޭ'
normalize("ޑރ. އިބްރާހިމް 14:30 ގައި އައި")
normalize("ކ.އަތޮޅު ވިލިނގިލިން 120 ކިލޯ މީޓަރު")
normalize("ފޭސް2ގެ")
For repeated use, hold onto a Normalizer instance:
from dv_normalize import Normalizer, NormalizerConfig
n = Normalizer(NormalizerConfig(keep_punctuation=False))
n("ހެލޯ، ދުނިޔެ") # → 'ހެލޯ ދުނިޔެ'
What it handles
| Class | Example input | Example output |
|---|---|---|
| Cardinal | 232 |
ދުއިސައްތަ ތިރީސް ދޭއް |
| Comma-grouped | 104,880 |
(single cardinal, not per-digit) |
| Per-digit | 9982711 |
spelled digit-by-digit (7+ digit identifier) |
| Decimal | 232.23 |
ދުއިސައްތަ ތިރީސް ދޭއް ޕޮއިންޓް ދޭއް ތިނެއް |
| Year | 2024 |
ދެހާސް ސައުވީސް |
| Year range | 1982 - 2024 |
… ން … އަށް |
| Time | 14:30 |
ސާދަ ގަޑި ތިރީސް |
| Ordinal | 11ވަނަ |
adnominal head form |
| Fraction | 1/2 |
ދެބައިކުޅަ އެއްބައި |
| Mixed fraction | 1 1/2 |
… އަދި … |
| Percent | 25% |
ފަންސަވީސް ޕަސެންޓް |
| Oblique ref | 2024/3 |
… ޚާއްސަ <denom-ordinal> |
| Score | 3-2, 0-0, 5-0 |
compact draw / shutout forms |
| Money | 52 ރ. |
Rufiyaa context-sensitive |
| Abbreviation | ޑރ., ހއ. |
ޑޮކްޓަރު, ހާ އަލިފު |
| Compound abbrev | ސ.ޢ.ވ. |
ޞައްލަﷲ ޢަލައިހި ވަސައްލަމް |
| Calendar marker | 2026 މ., 1447 ހ. |
… މީލާދީ, … ހިޖުރީ |
| Sentence ending | ހޯދައެވެ |
ހޯދައޭ (113 rules, context-sensitive) |
The classifier is priority-ranked, so more specific patterns (calendar markers, multi-letter compound abbreviations, year ranges) shadow the generic ones. Tokens that don't match any rule pass through unchanged.
Configuration
NormalizerConfig(
dialect="spoken", # only option for now
unknown_latin="passthrough", # "passthrough" | "drop" | "spell"
decimal_separator="auto", # "auto" | "dot" | "comma"
time_system="auto", # "auto" | "12" | "24"
currency_default="MVR",
keep_punctuation=True,
diagnostic=False,
strict=False,
)
Diagnostic mode
Normalizer.trace(text) returns the classified token list instead of joined
text. Useful for debugging which rule fired:
for tok in Normalizer().trace("ޑރ. އިބްރާހިމް 2024ގައި"):
print(tok.cls, tok.text, tok.spoken, tok.fields)
Legacy API
The original 0.1.x classes (DhivehiNumberConverter, DhivehiTimeConverter,
DhivehiYearConverter, DhivehiTextProcessor) are still importable from
dv_normalize but emit a DeprecationWarning. They are scheduled for
removal in a future major release — migrate to normalize() / Normalizer.
License
MIT — see LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dv_normalizer-0.1.8.tar.gz.
File metadata
- Download URL: dv_normalizer-0.1.8.tar.gz
- Upload date:
- Size: 49.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ffd3f6550a20c109fb28ed084655a58820f07c61524d265bd2b69bb59433e591
|
|
| MD5 |
9ed5eccd96c5c7d7a75e94fbfc621903
|
|
| BLAKE2b-256 |
0b0cfad80406d2973b3aa2982966d87480912575fd92d1f6af312e195c2ccf02
|
File details
Details for the file dv_normalizer-0.1.8-py3-none-any.whl.
File metadata
- Download URL: dv_normalizer-0.1.8-py3-none-any.whl
- Upload date:
- Size: 62.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b38b76f7f54dfa1a96a23c166741f6d446465ebb43eeed8f7ddaccc030ef4f8f
|
|
| MD5 |
ddddb8dc9596fb99a4fca2563ff41ac6
|
|
| BLAKE2b-256 |
a51b639bd668fe249d122e7f4798c6f89088cc9a7c16c7d62481065a711004f6
|