graphtransliterator

A graph-based transliteration tool

These details have not been verified by PyPI

Project links

Project description

https://img.shields.io/badge/Linter-Ruff-brightgreen?style=flat-square

A graph-based transliteration tool that lets you convert the symbols of one language or script to those of another using rules that you define.

Free software: MIT license
Documentation: https://graphtransliterator.readthedocs.io
Repository: https://github.com/seanpue/graphtransliterator

Transliteration… What? Why?

Moving text or data from one script or encoding to another is a common problem:

Many languages are written in multiple scripts, and many people can only read one of them. Moving between them can be a complex but necessary task in order to make texts accessible.
The identification of names and locations, as well as machine translation, benefit from transliteration.
Library systems often require metadata be in particular forms of romanization in addition to the original script.
Linguists need to move between different methods of phonetic transcription.
Documents in legacy fonts must now be converted to contemporary Unicode ones.
Complex-script languages are frequently approached in natural language processing and in digital humanities research through transliteration, as it provides disambiguating information about pronunciation, morphological boundaries, and unwritten elements not present in the original script.

Graph Transliterator abstracts transliteration, offering an “easy reading” method for developing transliterators that does not require writing a complex program. It also contains bundled transliterators that are rigorously tested. These can be expanded to handle many transliteration tasks.

Contributions are very welcome!

Features

Provides a transliteration tool that can be configured to convert the tokens of an input string into an output string using:
- user-defined types of input tokens and token classes
- transliteration rules based on:
  - a sequence of input tokens
  - specific input tokens that precede or follow the token sequence
  - classes of input tokens preceding or following specified tokens
- “on match” rules for output to be inserted between transliteration rules involving particular token classes
- defined rules for whitespace, including its optional consolidation
Can be setup using:
- an “easy reading” YAML format that lets you quickly craft settings for the transliteration tool
- a JSON dump of a transliterator (quicker!)
- “direct” settings, perhaps passed programmatically, using a dictionary
Automatically orders rules by the number of tokens in a transliteration rule
Checks for ambiguity in transliteration rules
Can provide details about each transliteration rule match
Allows optional matching of all possible rules in a particular location
Permits pruning of rules with certain productions
Validates, as well as serializes to and deserializes from JSON and Python data types, using accessible marshmallow schemas
Provides full support for Unicode, including Unicode character names in the “easy reading” YAML format
Constructs and uses a directed tree and performs a best-first search to find the most specific transliteration rule in a given context
Includes bundled transliterators that you can add to hat check for full test coverage of the nodes and edges of the internal graph and any “on match” rules
Includes a command-line interface to perform transliteration and other tasks

Sample Code and Graph

from graphtransliterator import GraphTransliterator
GraphTransliterator.from_yaml("""
    tokens:
      h: [consonant]
      i: [vowel]
      " ": [whitespace]
    rules:
      h: \N{LATIN SMALL LETTER TURNED I}
      i: \N{LATIN SMALL LETTER TURNED H}
      <whitespace> i: \N{LATIN CAPITAL LETTER TURNED H}
      (<whitespace> h) i: \N{LATIN SMALL LETTER TURNED H}!
    onmatch_rules:
      - <whitespace> + <consonant>: ¡
    whitespace:
      default: " "
      consolidate: true
      token_class: whitespace
    metadata:
      title: "Upside Down Greeting Transliterator"
      version: "1.0.0"
""").transliterate("hi")

'¡ᴉɥ!'

sample graph — Sample directed tree created by Graph Transliterator. The rule nodes are in double circles, and token nodes are single circles. The numbers are the cost of the particular edge, and less costly edges are searched first. Previous token classes and previous tokens that must be present are found as constraints on the edges incident to the terminal leaf rule nodes.

Get It Now

$ pip install -U graphtransliterator

Citation

To cite Graph Transliterator, please use:

Pue, A. Sean (2019). Graph Transliterator: A graph-based transliteration tool. Journal of Open Source Software, 4(44), 1717, https://doi.org/10.21105/joss.01717

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.3.2

Jul 14, 2026

1.3.1

Jul 11, 2026

1.3.0

Jul 11, 2026

1.2.4

Oct 16, 2023

1.2.3

Oct 9, 2023

1.2.2

Aug 12, 2021

1.2.0

May 13, 2020

1.1.2

Apr 29, 2020

1.1.1

Apr 21, 2020

1.0.7

Dec 22, 2019

1.0.6

Dec 15, 2019

1.0.5

Dec 14, 2019

1.0.4

Nov 30, 2019

1.0.2

Nov 30, 2019

1.0.1

Nov 30, 2019

1.0.0

Nov 26, 2019

0.4.10

Nov 5, 2019

0.4.8

Nov 5, 2019

0.4.7

Nov 5, 2019

0.4.6

Nov 4, 2019

0.4.5

Nov 1, 2019

0.4.4

Oct 24, 2019

0.4.3

Oct 24, 2019

0.4.2

Oct 24, 2019

0.3.8

Sep 20, 2019

0.3.7

Sep 17, 2019

0.3.6

Sep 17, 2019

0.3.5

Sep 16, 2019

0.3.4

Sep 15, 2019

0.3.2

Aug 30, 2019

0.3.1

Aug 29, 2019

0.3.0

Aug 23, 2019

0.2.14

Aug 15, 2019

0.2.13

Aug 3, 2019

0.2.8

Jul 30, 2019

0.2.7

Jul 28, 2019

0.2.6

Jul 28, 2019

0.2.5

Jul 26, 2019

0.2.4

Jul 23, 2019

0.2.3

Jul 23, 2019

0.2.1

Jul 23, 2019

0.1.0

May 30, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graphtransliterator-1.3.2.tar.gz (32.4 kB view details)

Uploaded Jul 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

graphtransliterator-1.3.2-py3-none-any.whl (39.7 kB view details)

Uploaded Jul 14, 2026 Python 3

File details

Details for the file graphtransliterator-1.3.2.tar.gz.

File metadata

Download URL: graphtransliterator-1.3.2.tar.gz
Upload date: Jul 14, 2026
Size: 32.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.4.1 CPython/3.12.3 Linux/6.17.0-1018-azure

File hashes

Hashes for graphtransliterator-1.3.2.tar.gz
Algorithm	Hash digest
SHA256	`af5db58b02c2c2f8fdd498c8509aeab56e70ed23b95a7f5f52d184a544e22670`
MD5	`e6514601aa1b5c604db4ed163aba6d01`
BLAKE2b-256	`d257b1f3bec0f351553089ef879bc1cdceca7caca1fa2d7806dbee1decdc7c43`

See more details on using hashes here.

File details

Details for the file graphtransliterator-1.3.2-py3-none-any.whl.

File metadata

Download URL: graphtransliterator-1.3.2-py3-none-any.whl
Upload date: Jul 14, 2026
Size: 39.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.4.1 CPython/3.12.3 Linux/6.17.0-1018-azure

File hashes

Hashes for graphtransliterator-1.3.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a0ca4428d4ab5af136eeb81eaa9e21fd20a32898b85e005f256c1afe48b7f269`
MD5	`e19b86135e111da39d2c54b6200c1c55`
BLAKE2b-256	`a0ba3a38a057c228b6affaac4835fba993c74ebde3b9f054c8ae2f44c8e33b4f`

See more details on using hashes here.

graphtransliterator 1.3.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Transliteration… What? Why?

Features

Sample Code and Graph

Get It Now

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes