Skip to main content

Convert Nepali text to roman English

Project description

nepali-roman

Package to convert Nepali text to romanized English.

Many a time, while working with data, either from government sector or from private sector, we encounter Nepali names of VDCs, municipalities, local level units, districts and places. While merging such data with another datasheet consisting names of units in English, we need to romanize the Nepali names to English before using some text similarity algorithms to do string matching.

This package helps in romanizing Nepali names using rudimentary script. It is worth mentioning that romanization of Nepali text used in our daily life is inconsistent. For example, the popular romanization of "नेपाल" is "Nepal". It might seem as the correct romanization but if we have to distinguish between "नेपाल" and "नेपल", we can't do that with this romanization scheme. So, this package uses unique and standard romanization scheme such that no two Nepali words have same romanization.

Installation

nepali-roman package is available for Python 3 and can be installed using pip.

First make sure pip is installed.

Then, the package can be installed using this command.

pip install nepali-roman

Usage

Import the nepali_roman module using the following command.

import nepali_roman as nr

The nepali_roman module has three functions: is_devanagari, romanize_text and romanize.

is_devanagari

This function checks if the text is in Devanagari format.

Detail description: In the text, it ignores all the punctuations, white spaces and other non-alphanumeric characters and then counts the number of devanagari characters. If the number of devanagari characters is more than or equal to 50% of the stripped text, the function deems the text devanagari, otherwise not.

Syntax:

>>> nr.is_devanagari(text)

Example:

>>> import nepali_roman as nr
>>> nr.is_devanagari("नगरपालिका")
    True

>>> nr.is_devanagari("surajपालिक")
    False

>>> nr.is_devanagari("suraj")
    False

romanize_text

This function can be used to romanize the Nepali text to English.

Syntax:

>>> nr.romanize_text(nepali_text)

Example:

>>> import nepali_roman as nr
>>> nr.romanize_text("नगरपालिका")
    nagarapaalikaa

romanize

This function takes Nepali text file as an input and saves the romanized text in specified output file.

Syntax:

>>> nr.romanize(input_file_path, output_file_path)

Example:

>>> import nepali_roman as nr
>>> nr.romanize('nepali.txt', 'romanized_english.txt')
# this takes Nepali text file nepali.txt and stores romanized English in the file romanized_english.txt

Contributions

The package is licenced with The MIT License (MIT) about which details can be found in the LICENSE file. As the package is open sourced and requires many improvements and extensions, any contributions are welcome. Any other feedback of any sort are highly welcome.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nepali_roman-1.1.tar.gz (4.9 kB view details)

Uploaded Source

File details

Details for the file nepali_roman-1.1.tar.gz.

File metadata

  • Download URL: nepali_roman-1.1.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.22.0 setuptools/51.3.3 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.6.9

File hashes

Hashes for nepali_roman-1.1.tar.gz
Algorithm Hash digest
SHA256 b47289bda80063b3665cdcff260eb6e2f4aa3d6a009ad64e1212b19aebaaf41b
MD5 79879608faaec4bec089bdff0ada93f5
BLAKE2b-256 c34aed8b323f0a671dac7e2436d5f73692e783c154d61898b91389d3474c6ee7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page