Skip to main content

Dhivehi text normalization for TTS frontends

Project description

dv-normalize

A Dhivehi text normalizer for TTS frontends. Converts numbers, dates, times, fractions, scores, abbreviations, money, percentages, and other non-Thaana input into spoken-form Dhivehi.

Status: 0.1.8. The v1 rewrite ships under the existing 0.1.x line. The new public API (normalize, Normalizer, NormalizerConfig) is the supported entry point. The legacy 0.1.x classes are still exported but emit a DeprecationWarning and will be removed in a future major release.

Installation

pip install dv-normalize

Quick start

from dv_normalize import normalize

normalize("ވަކި ލާރިން ވެސް 232.23 ލާރި ހޯދައެވެ")
# 'ވަކި ލާރިން ވެސް ދުއިސައްތަ ތިރީސް ދޭއް ޕޮއިންޓް ދޭއް ތިނެއް ލާރި ހޯދައޭ'

normalize("ޑރ. އިބްރާހިމް 14:30 ގައި އައި")
normalize("ކ.އަތޮޅު ވިލިނގިލިން 120 ކިލޯ މީޓަރު")
normalize("ފޭސް2ގެ")

For repeated use, hold onto a Normalizer instance:

from dv_normalize import Normalizer, NormalizerConfig

n = Normalizer(NormalizerConfig(keep_punctuation=False))
n("ހެލޯ، ދުނިޔެ")  # → 'ހެލޯ ދުނިޔެ'

What it handles

Class Example input Example output
Cardinal 232 ދުއިސައްތަ ތިރީސް ދޭއް
Comma-grouped 104,880 (single cardinal, not per-digit)
Per-digit 9982711 spelled digit-by-digit (7+ digit identifier)
Decimal 232.23 ދުއިސައްތަ ތިރީސް ދޭއް ޕޮއިންޓް ދޭއް ތިނެއް
Year 2024 ދެހާސް ސައުވީސް
Year range 1982 - 2024 … ން … އަށް
Time 14:30 ސާދަ ގަޑި ތިރީސް
Ordinal 11ވަނަ adnominal head form
Fraction 1/2 ދެބައިކުޅަ އެއްބައި
Mixed fraction 1 1/2 … އަދި …
Percent 25% ފަންސަވީސް ޕަސެންޓް
Oblique ref 2024/3 … ޚާއްސަ <denom-ordinal>
Score 3-2, 0-0, 5-0 compact draw / shutout forms
Money 52 ރ. Rufiyaa context-sensitive
Abbreviation ޑރ., ހއ. ޑޮކްޓަރު, ހާ އަލިފު
Compound abbrev ސ.ޢ.ވ. ޞައްލަﷲ ޢަލައިހި ވަސައްލަމް
Calendar marker 2026 މ., 1447 ހ. … މީލާދީ, … ހިޖުރީ
Sentence ending ހޯދައެވެ ހޯދައޭ (113 rules, context-sensitive)

The classifier is priority-ranked, so more specific patterns (calendar markers, multi-letter compound abbreviations, year ranges) shadow the generic ones. Tokens that don't match any rule pass through unchanged.

Configuration

NormalizerConfig(
    dialect="spoken",            # only option for now
    unknown_latin="passthrough", # "passthrough" | "drop" | "spell"
    decimal_separator="auto",    # "auto" | "dot" | "comma"
    time_system="auto",          # "auto" | "12" | "24"
    currency_default="MVR",
    keep_punctuation=True,
    diagnostic=False,
    strict=False,
)

Diagnostic mode

Normalizer.trace(text) returns the classified token list instead of joined text. Useful for debugging which rule fired:

for tok in Normalizer().trace("ޑރ. އިބްރާހިމް 2024ގައި"):
    print(tok.cls, tok.text, tok.spoken, tok.fields)

Legacy API

The original 0.1.x classes (DhivehiNumberConverter, DhivehiTimeConverter, DhivehiYearConverter, DhivehiTextProcessor) are still importable from dv_normalize but emit a DeprecationWarning. They are scheduled for removal in a future major release — migrate to normalize() / Normalizer.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dv_normalizer-0.1.8.tar.gz (49.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dv_normalizer-0.1.8-py3-none-any.whl (62.8 kB view details)

Uploaded Python 3

File details

Details for the file dv_normalizer-0.1.8.tar.gz.

File metadata

  • Download URL: dv_normalizer-0.1.8.tar.gz
  • Upload date:
  • Size: 49.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for dv_normalizer-0.1.8.tar.gz
Algorithm Hash digest
SHA256 ffd3f6550a20c109fb28ed084655a58820f07c61524d265bd2b69bb59433e591
MD5 9ed5eccd96c5c7d7a75e94fbfc621903
BLAKE2b-256 0b0cfad80406d2973b3aa2982966d87480912575fd92d1f6af312e195c2ccf02

See more details on using hashes here.

File details

Details for the file dv_normalizer-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: dv_normalizer-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 62.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for dv_normalizer-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 b38b76f7f54dfa1a96a23c166741f6d446465ebb43eeed8f7ddaccc030ef4f8f
MD5 ddddb8dc9596fb99a4fca2563ff41ac6
BLAKE2b-256 a51b639bd668fe249d122e7f4798c6f89088cc9a7c16c7d62481065a711004f6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page