A graph-based transliteration tool
Project description
Graph Transliterator
A graph-based transliteration tool that lets you convert the symbols of one language or script to those of another using rules that you define.
Free software: MIT license
Documentation: https://graphtransliterator.readthedocs.io.
Features
Provides a transliteration tool that can be configured to convert the tokens of an input string into an output string using:
user-defined types of input tokens and token classes
transliteration rules based on:
a sequence of input tokens
specific input tokens that precede or follow the token sequence
classes of input tokens preceding or following specified tokens
“on match” rules for output to be inserted between transliteration rules involving particular token classes
defined rules for whitespace, including its optional consolidation
Can be setup using:
an “easy reading” YAML format that lets you quickly craft settings for the transliteration tool
“direct” settings, perhaps passed programmatically, using a dictionary
Automatically orders rules by the number of tokens in a transliteration rule
Checks for ambiguity in transliteration rules
Can provide details about each transliteration rule match
Allows optional matching of all possible rules in a particular location
Permits pruning of rules with certain productions
Validates, as well as serializes to and deserializes from JSON and Python data types, using accessible marshmallow schemas
Provides full support for Unicode, including Unicode character names in the “easy reading” YAML format
Constructs and uses a directed tree and performs a best-first search to find the most specific transliteration rule in a given context
History
[Unreleased - Maybe]
Add CLI
Add metadata guidelines
Save match location in tokenize
Add tests directly to YAML files
Allow insertion of transliteration error messages into output
Fix Devanagari output in doc PDF
Add translated messages using Transifex
[Unreleased-TODO]
0.3.2 (2019-08-30)
fixed error in README.rst
0.3.1 (2019-08-29)
adjustments to README.rst
cleanup in initialize.py and core.py
fix to docs/api.rst
adjusted setup.cfg for bumpversion of core.py
adjusted requirements.txt
removed note about namedtuple in dump docs
adjusted docs (api.rst, etc.)
0.3.0 (2019-08-23)
Removed _tokens_of() from init
Removed serialize()
Added load() to GraphTransliterator, without ambiguity checking
Added dump() and dumps() to GraphTransliterator to export configuration
renamed _tokenizer_from() to _tokenizer_pattern_from(), and so that regex is compiled on load and passed as pattern string (tokenizer_pattern)
added settings parameters to DirectedGraph
added OnMatchRule as namedtuple for consistency
added new GraphTransliterator.from_dict(), which validates from_yaml()
renamed GraphTransliterator.from_dict() to GraphTransliterator.from_easyreading_dict()
added schemas.py
removed validate.py
removed cerberus and added marshmallow to validate.py
adjusted tests
Removed check_settings parameter
0.2.14 (2019-08-15)
minor code cleanup
removed yaml from validate.py
0.2.13 (2019-08-03)
changed setup.cfg for double quotes in bumpversion due to Black formatting of setup.py
added version test
0.2.12 (2019-08-03)
fixed version error in setup.py
0.2.11 (2019-08-03)
travis issue
0.2.10 (2019-08-03)
fixed test for version not working on travis
0.2.9 (2019-08-03)
Used Black code formatter
Adjusted tox.ini, contributing.rst
Set development status to Beta in setup.py
Added black badge to README.rst
Fixed comments and minor changes in initialize.py
0.2.8 (2019-07-30)
Fixed ambiguity check if no rules present
Updates to README.rst
0.2.7 (2019-07-28)
Modified docs/conf.py
Modified equation in docs/usage.rst and paper/paper.md to fix doc build
0.2.6 (2019-07-28)
Fixes to README.rst, usage.rst, paper.md, and tutorial.rst
Modifications to core.py documentation
0.2.5 (2019-07-24)
Fixes to HISTORY.rst and README.rst
100% test coverage.
Added draft of paper.
Added graphtransliterator_version to serialize().
0.2.4 (2019-07-23)
minor changes to readme
0.2.3 (2019-07-23)
added xenial to travis.yml
0.2.2 (2019-07-23)
added CI
0.2.1 (2019-07-23)
fixed HISTORY.rst for PyPI
0.2.0 (2019-07-23)
Fixed module naming in docs using __module__.
Converted DirectedGraph nodes to a list.
Added Code of Conduct.
Added GraphTransliterator class.
Updated module dependencies.
Added requirements.txt
Added check_settings parameter to skip validating settings.
Added tests for ambiguity and check_ambiguity parameter.
Changed name to Graph Transliterator in docs.
Created core.py, validate.py, process.py, rules.py, initialize.py, exceptions.py, graphs.py
Added ignore_errors property and setter for transliteration exceptions (UnrecognizableInputToken, NoMatchingTransliterationRule)
Added logging to graphtransliterator
Added positive cost function based on number of matched tokens in rule
added metadata field
added documentation
0.1.1 (2019-05-30)
Adjusted copyright in docs.
Removed Python 2 support.
0.1.0 (2019-05-30)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for graphtransliterator-0.3.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | f7c7ebd81d5f1963e6b801dab4b3a87cd0e94091b36fc0bf034230c7eb08ab58 |
|
MD5 | 5cb4e8cdd52e03bff54381693a3cec14 |
|
BLAKE2b-256 | 9bdc7ff8ab67dc5405c6cb4d9eb2413148554071c0630a8355aaa9867b86fe01 |
Hashes for graphtransliterator-0.3.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b684a7e984afeed16f535e97dc370b0f364d75fd11c0480dfd00c75f9d7223e6 |
|
MD5 | 97ee65ddfb218e914ea0c1d5544d6619 |
|
BLAKE2b-256 | 2d0f7f080c109a0121b6d66920b8bcfd710a1b7dfafb586bc15153187b7e3b84 |