Skip to main content

A graph-based transliteration tool

Project description

https://img.shields.io/pypi/v/graphtransliterator.svg https://img.shields.io/travis/seanpue/graphtransliterator.svg Documentation Status Updates https://img.shields.io/badge/code%20style-black-000000.svg PyPI - Python Version

A graph-based transliteration tool that lets you convert the symbols of one language or script to those of another using rules that you define.

Features

  • Provides a transliteration tool that can be configured to convert the tokens of an input string into an output string using:

    • user-defined types of input tokens and token classes

    • transliteration rules based on:

      • a sequence of input tokens

      • specific input tokens that precede or follow the token sequence

      • classes of input tokens preceding or following specified tokens

    • “on match” rules for output to be inserted between transliteration rules involving particular token classes

    • defined rules for whitespace, including its optional consolidation

  • Can be setup using:

    • an “easy reading” YAML format that lets you quickly craft settings for the transliteration tool

    • “direct” settings, perhaps passed programmatically, using a dictionary

  • Automatically orders rules by the number of tokens in a transliteration rule

  • Checks for ambiguity in transliteration rules

  • Can provide details about each transliteration rule match

  • Allows optional matching of all possible rules in a particular location

  • Permits pruning of rules with certain productions

  • Validates, as well as serializes to and deserializes from JSON and Python data types, using accessible marshmallow schemas

  • Provides full support for Unicode, including Unicode character names in the “easy reading” YAML format

  • Constructs and uses a directed tree and performs a best-first search to find the most specific transliteration rule in a given context

Sample Code and Graph

>>> from graphtransliterator import GraphTransliterator
>>> GraphTransliterator.from_yaml("""
...     tokens:
...       h: [consonant]
...       i: [vowel]
...       " ": [whitespace]
...     rules:
...       h: \N{LATIN SMALL LETTER TURNED I}
...       i: \N{LATIN SMALL LETTER TURNED H}
...       <whitespace> i: \N{LATIN CAPITAL LETTER TURNED H}
...       (<whitespace> h) i: \N{LATIN SMALL LETTER TURNED H}!
...     onmatch_rules:
...       - <whitespace> + <consonant>: "¡"
...     whitespace:
...       default: " "
...       consolidate: true
...       token_class: whitespace
...     metadata:
...       title: "Upside Down Greeting Transliterator"
...       version: "1.0"
... """).transliterate("hi")
'¡ᴉɥ!'
sample graph

Sample directed tree created by Graph Transliterator. The rule nodes are in double circles, and token nodes are single circles. The numbers are the cost of the particular edge, and less costly edges are searched first. Previous token class (prev_classes) and previous token (prev_tokens) constraints are found on edges before leaf rule nodes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graphtransliterator-0.3.7.tar.gz (140.5 kB view hashes)

Uploaded Source

Built Distribution

graphtransliterator-0.3.7-py3-none-any.whl (30.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page