A graph-based transliteration tool
Project description
A graph-based transliteration tool that lets you convert the symbols of one language or script to those of another using rules that you define.
Free software: MIT license
Documentation: https://graphtransliterator.readthedocs.io.
Features
Provides a transliteration tool that can be configured to convert the tokens of an input string into an output string using:
user-defined types of input tokens and token classes
transliteration rules based on:
a sequence of input tokens
specific input tokens that precede or follow the token sequence
classes of input tokens preceding or following specified tokens
“on match” rules for output to be inserted between transliteration rules involving particular token classes
defined rules for whitespace, including its optional consolidation
Can be setup using:
an “easy reading” YAML format that lets you quickly craft settings for the transliteration tool
“direct” settings, perhaps passed programmatically, using a dictionary
Automatically orders rules by the number of tokens in a transliteration rule
Checks for ambiguity in transliteration rules
Can provide details about each transliteration rule match
Allows optional matching of all possible rules in a particular location
Permits pruning of rules with certain productions
Validates, as well as serializes to and deserializes from JSON and Python data types, using accessible marshmallow schemas
Provides full support for Unicode, including Unicode character names in the “easy reading” YAML format
Constructs and uses a directed tree and performs a best-first search to find the most specific transliteration rule in a given context
Sample Code and Graph
>>> from graphtransliterator import GraphTransliterator >>> GraphTransliterator.from_yaml(""" ... tokens: ... h: [consonant] ... i: [vowel] ... " ": [whitespace] ... rules: ... h: \N{LATIN SMALL LETTER TURNED I} ... i: \N{LATIN SMALL LETTER TURNED H} ... <whitespace> i: \N{LATIN CAPITAL LETTER TURNED H} ... (<whitespace> h) i: \N{LATIN SMALL LETTER TURNED H}! ... onmatch_rules: ... - <whitespace> + <consonant>: ¡ ... whitespace: ... default: " " ... consolidate: true ... token_class: whitespace ... metadata: ... title: "Upside Down Greeting Transliterator" ... version: "1.0" ... """).transliterate("hi") '¡ᴉɥ!'
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for graphtransliterator-0.3.8.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 95833e98b1689a06bf4eb12a2dff6fc600e8af48700c31b4289a0bbc3baed19e |
|
MD5 | 546daa22ebbe7c8f7d473c30c45a0dc2 |
|
BLAKE2b-256 | 81634a4edf1c6c5f72590383807a0e32e6b297f870e2319b98a11d3484cde276 |
Hashes for graphtransliterator-0.3.8-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ebf4c0e2c935bb571a17b62a881c2822926305e16fa8738085b2fca6ad985e77 |
|
MD5 | 156f76f828bebc6ad9e3132e8a7cd70f |
|
BLAKE2b-256 | cfb57de19d6a8d74d7231950fa28177d413f669bcab37672b5e4528d19c91867 |