Skip to main content

A Python package for Roman to Nepali (Devanagari) transliteration

Project description

Nepali Unicoder

A robust Python package for converting Romanized Nepali text and Preeti font text into Unicode Devanagari script. It uses a greedy matching algorithm for Roman transliteration and a two-phase conversion process for Preeti with contextual rules.

Features

  • Accurate Transliteration: Uses a greedy matching algorithm to prioritize longer phonetic matches (e.g., 'kha' is matched before 'k' and 'h').
  • Preeti Font Support: Full support for Preeti to Unicode conversion with 30+ contextual rules for accurate transformation.
  • Smart Vowel Handling: Distinguishes between independent vowels (e.g., 'aa' -> 'आ') and vowel signs/matras (e.g., 'ka' -> 'क', 'kaa' -> 'का').
  • Contextual Rules: Handles complex Devanagari rules like reph positioning, matra reordering, and special character combinations.
  • Mixed Content Support: Allows keeping English words or specific text in Roman script using {} blocks.
  • Customizable: Supports custom word-level overrides via word_maps.json.
  • CLI Support: Can be used directly from the command line.

Installation

You can install the package locally:

pip install nepali-unicoder

Usage

Command Line Interface (CLI)

You can use the converter directly from the terminal:

# Direct argument
python -m nepali_unicoder "namaste"
# Output: नमस्ते

# Pipe input
echo "mero naam sanjeev ho" | python -m nepali_unicoder
# Output: मेरो नाम सन्जीव् हो

Python API

from nepali_unicoder.convert import Converter

converter = Converter()

# Basic conversion
text = "namaste nepal"
print(converter.convert(text))
# Output: नमस्ते नेपाल

# Using 'as-is' blocks for English text
mixed_text = "mero naam {Sanjeev} ho"
print(converter.convert(mixed_text))
# Output: मेरो नाम Sanjeev हो

Preeti Mode

Convert Preeti font text to Unicode with full support for contextual rules:

from nepali_unicoder.convert import Converter

# Create converter in Preeti mode
preeti_converter = Converter(mode="preeti")

# Basic conversion
preeti_text = "s{sf"  # Preeti characters
print(preeti_converter.convert(preeti_text))
# Output: र्कर्का

# The converter handles:
# - Reph positioning: { → र् (moves before consonant)
# - Matra reordering: l (ि) moves after consonant
# - Special m transformations
# - Vowel combinations

Preeti Character Examples

Preeti Unicode Description
s Consonant ka
s{ र्क Reph + ka (contextual)
sl कि ka + short i (reordered)
qm क्र Special m transformation
!@# १२३ Nepali numbers

CLI for Preeti

python -m nepali_unicoder --preeti "s{sf"
# Output: र्कर्का

Transliteration Rules

  • Consonants: k -> क्, ka -> , kh -> ख्, kha ->
  • Vowels: a -> , aa -> , i -> , u ->
  • Matras: ki -> कि, ko -> को
  • Special: . -> , .. ->
  • Numbers: 0-9 -> ०-९ (Decimal points are preserved: 1.5 -> १.५)

Advanced Usage

Handling Complex Text

The converter handles mixed content gracefully. You can use {} to keep text as-is (e.g., for English words or code snippets).

text = "mero naam {Sanjeev} ho ra ma 12.5 barsa ko bhaye."
print(converter.convert(text))
# Output: मेरो नाम Sanjeev हो र म १२.५ बर्स को भए।

Configuration

The package uses word_maps.json for custom word-level overrides, located in the src/nepali_unicoder directory.

  1. word_maps.json: Defines custom word-level overrides. Use this for words that don't follow standard phonetic rules.

Example word_maps.json:

{
    "nepal": "नेपाल",
    "kathamandu": "काठमाडौँ"
}

Contribution

We welcome contributions! Here's how you can help:

  1. Clone the repository:

    git clone https://github.com/realsanjeev/nepali_unicoder.git
    cd nepali_unicoder
    
  2. Set up a virtual environment:

    python3 -m venv .venv
    source .venv/bin/activate
    pip install -e .
    
  3. Run tests:

    python -m unittest discover tests
    
  4. Submit a Pull Request: Create a new branch, make your changes, and submit a PR.

Development

To run tests:

python -m unittest discover tests

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nepali_unicoder-0.1.1.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nepali_unicoder-0.1.1-py3-none-any.whl (12.2 kB view details)

Uploaded Python 3

File details

Details for the file nepali_unicoder-0.1.1.tar.gz.

File metadata

  • Download URL: nepali_unicoder-0.1.1.tar.gz
  • Upload date:
  • Size: 14.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for nepali_unicoder-0.1.1.tar.gz
Algorithm Hash digest
SHA256 e7df1d1dd12bc3b62f00ccf8ab9aca0ace705dc6074ad7f55d07e93725ffb543
MD5 419e2df71954bbd508f0cae5212fec31
BLAKE2b-256 3342b07c13051201e240a9997794e9ffe9c944c03b3c4b0920f093a1cc51b583

See more details on using hashes here.

File details

Details for the file nepali_unicoder-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for nepali_unicoder-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5303df882f06ba6e1595c56cf7aa020bd22d9c82f4533ab87e5b8c91e4ef63a4
MD5 b7b9f135005d497d9ed5205cf653be94
BLAKE2b-256 2d534aeba1e7b62704d5215e332baab4c2aea29f3e3864548b43a22301167047

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page