Classes and functions for performing pseudo-localization on strings and PO files.
Project description
``pseudol10nutil``
==================
Python module for performing pseudo-localization on strings. Tested against Python 2, Python3, PyPy and PyPy3.
Installation
------------
The module is available on `PyPI <https://pypi.org/project/pseudol10nutil/>`_ and is installable via ``pip``:
``pip install pseudol10nutil``
Dependencies
------------
This package has the following external dependencies:
* `six <https://pythonhosted.org/six/>`_ - for Python 2 to 3 compatibility
``PseudoL10nUtil`` class
------------------------
Class for pseudo-localizing strings. The class currently has the following members:
- ``transforms`` - field that contains the list of transforms to apply to the string. The transforms will be applied in order. Default is ``[transliterate_diacritic, pad_length, square_brackets]``
- ``pseudolocalize(s)`` - method that returns a new string where the transforms to the input string ``s`` have been applied.
``pseudol10nutil.transforms`` module
------------------------------------
The following transforms are currently available:
- ``transliterate_diacritic`` - Takes the input string and returns a copy with diacritics added e.g. ``Hello`` -> ``Ȟêĺĺø``.
- ``transliterate_circled`` - Takes the input string and returns a copy with circled versions of the letters e.g. ``Hello`` -> ``Ⓗⓔⓛⓛⓞ``
- ``transliterate_fullwidth`` - Takes the input string and returns a copy with the letters converted to their fullwidth counterparts e.g. ``Hello`` -> ``Hello``
- ``pad_length`` - Appends a series of characters to the end of the input string to increase the string length per `IBM Globalization Design Guideline A3: UI Expansion <https://www-01.ibm.com/software/globalization/guidelines/a3.html>`_.
- ``angle_brackets`` - Surrounds the input string with '《' and '》' characters.
- ``curly_brackets`` - Surrounds the input string with '❴' and '❵' characters.
- ``square_brackets`` - Surrounds the input string with '⟦' and '⟧' characters.
Format string support
---------------------
When performing pseudo-localization on a string, the process will skip performing pseudo-localization on format strings. Python style format strings (e.g. ``{foo}``) and printf style format strings (e.g. ``%s``) are supported. For example::
Input [1]: Source {source1} returned 0 rows.
Output [1]: '⟦Șøüȓċê {source1} ȓêťüȓñêđ 0 ȓøẁš.﹎ЍאdžᾏⅧ㈴㋹퓛ﺏ𝟘🚦﹎ЍאdžᾏⅧ㈴㋹⟧
Input [2]: Source %(source2)s returned 1 row.
Output [2]: ⟦Șøüȓċê %(source2)s ȓêťüȓñêđ 1 ȓøẁ.﹎ЍאdžᾏⅧ㈴㋹퓛ﺏ𝟘🚦﹎ЍאdžᾏⅧ㈴㋹퓛⟧
Input [3]: Source %s returned %d rows.
Output [3]: ⟦Șøüȓċê %s ȓêťüȓñêđ %d ȓøẁš.﹎ЍאdžᾏⅧ㈴㋹퓛ﺏ𝟘🚦﹎ЍאdžᾏⅧ㈴㋹퓛ﺏ⟧
Example usage
^^^^^^^^^^^^^
Python 3 example::
>>> from pseudol10nutil import PseudoL10nUtil
>>> util = PseudoL10nUtil()
>>> s = u"The quick brown fox jumps over the lazy dog."
>>> util.pseudolocalize(s)
'⟦Ťȟê ʠüıċǩ ƀȓøẁñ ƒøẋ ǰüɱƥš øṽêȓ ťȟê ĺàźÿ đøğ.﹎ЍאdžᾏⅧ㈴㋹퓛ﺏ𝟘🚦﹎ЍאdžᾏⅧ㈴㋹퓛ﺏ𝟘🚦﹎Ѝא⟧'
>>> import pseudolocalize.transforms
>>> util.transforms = [pseudol10nutil.transforms.transliterate_fullwidth, pseudol10nutil.transforms.curly_brackets]
>>> util.pseudolocalize(s)
'❴The quick brown fox jumps over the lazy dog.❵'
>>> util.transforms = [pseudol10nutil.transforms.transliterate_circled, pseudol10nutil.transforms.pad_length, pseudol10nutil.transforms.angle_brackets]
>>> util.pseudolocalize(s)
'《Ⓣⓗⓔ ⓠⓤⓘⓒⓚ ⓑⓡⓞⓦⓝ ⓕⓞⓧ ⓙⓤⓜⓟⓢ ⓞⓥⓔⓡ ⓣⓗⓔ ⓛⓐⓩⓨ ⓓⓞⓖ.﹎ЍאdžᾏⅧ㈴㋹퓛ﺏ𝟘🚦﹎ЍאdžᾏⅧ㈴㋹퓛ﺏ𝟘🚦﹎Ѝא》'
Example web app
---------------
There is an example web app in the ``examples/webapp/`` directory that provides a web UI and a REST endpoint for pseudo-localizing strings. This example is also available on Docker hub: `https://hub.docker.com/r/leonidessaguisagjr/pseudol10nutil/`_.
Once the docker container is running, the web UI could be accessed via the following URL:
`http://localhost:8080/pseudol10nutil/`_.
The REST endpoint could be accessed as follows::
>>> import pprint
>>> import requests
>>> strings = { "s1": "The quick brown {0} jumps over the lazy {1}.", }
>>> data = { "strings": strings }
>>> headers = { "Accept": "application/json", "Content-Type": "application/json" }
>>> api_url = "http://localhost:8080/pseudol10nutil/api/v1.0/pseudo"
>>> resp = requests.post(api_url, headers=headers, json=data)
>>> resp.status_code
200
>>> pprint.pprint(resp.json())
{'strings': {'s1': '⟦Ťȟê ʠüıċǩ ƀȓøẁñ {0} ǰüɱƥš øṽêȓ ťȟê ĺàźÿ '
'{1}.﹎ЍאdžᾏⅧ㈴㋹퓛ﺏ𝟘🚦﹎ЍאdžᾏⅧ㈴㋹퓛ﺏ𝟘🚦﹎Ѝא⟧'}}
``POFileUtil`` class
--------------------
Class for performing pseudo-localization on .po (Portable Object) message catalogs. Currently the class has a single method, ``pseudolocalizefile(input_file, output_file, input_encoding='UTF-8', output_encoding='UTF-8', overwrite_existing=True)``.
The default transforms will be applied to the strings in the input file. To override this behavior, create an instance of the ``PseudoL10nUtil`` class with the desired behavior and assign it to the ``l10nutil`` field prior to calling the ``pseudolocalizefile()`` method.
Example usage
^^^^^^^^^^^^^
Using pypy3::
>>>> from pseudol10nutil import POFileUtil
>>>> pofileutil = POFileUtil()
>>>> input_file = "./testdata/locales/helloworld.pot"
>>>> output_file = "./testdata/locales/eo/LC_MESSAGES/helloworld_pseudo.po"
>>>> pofileutil.pseudolocalizefile(input_file, output_file)
>>>> with open(input_file, mode="r") as fileobj:
.... for line in fileobj:
.... if line.startswith("msgstr"):
.... print(line)
....
msgstr ""
msgstr ""
msgstr ""
>>>> with open(output_file, mode="r") as fileobj:
.... for line in fileobj:
.... if line.startswith("msgstr"):
.... print(line)
....
msgstr ""
msgstr "⟦Ẃȟàť ıš ÿøüȓ ñàɱê?: ﹎ЍאdžᾏⅧ㈴㋹퓛ﺏ𝟘🚦﹎ЍאdžᾏⅧ㈴㋹⟧"
msgstr "⟦Ȟêĺĺø {0}!﹎ЍאdžᾏⅧ㈴㋹퓛ﺏ𝟘🚦﹎ЍאdžᾏⅧ㈴㋹⟧"
>>>> from pseudol10nutil import PseudoL10nUtil
>>>> util = PseudoL10nUtil()
>>>> import pseudol10nutil.transforms
>>>> util.transforms = [pseudol10nutil.transforms.transliterate_circled, pseudol10nutil.transforms.pad_length]
>>>> pofileutil.l10nutil = util
>>>> pofileutil.pseudolocalizefile(input_file, output_file)
>>>> with open(output_file, mode="r") as fileobj:
.... for line in fileobj:
.... if line.startswith("msgstr"):
.... print(line)
....
msgstr ""
msgstr "Ⓦⓗⓐⓣ ⓘⓢ ⓨⓞⓤⓡ ⓝⓐⓜⓔ?: ﹎ЍאdžᾏⅧ㈴㋹퓛ﺏ𝟘🚦﹎ЍאdžᾏⅧ㈴㋹"
msgstr "Ⓗⓔⓛⓛⓞ {0}!﹎ЍאdžᾏⅧ㈴㋹퓛ﺏ𝟘🚦﹎ЍאdžᾏⅧ㈴㋹"
>>>>
License
-------
This is released under an MIT license. See the ``LICENSE`` file in this repository for more information.
==================
Python module for performing pseudo-localization on strings. Tested against Python 2, Python3, PyPy and PyPy3.
Installation
------------
The module is available on `PyPI <https://pypi.org/project/pseudol10nutil/>`_ and is installable via ``pip``:
``pip install pseudol10nutil``
Dependencies
------------
This package has the following external dependencies:
* `six <https://pythonhosted.org/six/>`_ - for Python 2 to 3 compatibility
``PseudoL10nUtil`` class
------------------------
Class for pseudo-localizing strings. The class currently has the following members:
- ``transforms`` - field that contains the list of transforms to apply to the string. The transforms will be applied in order. Default is ``[transliterate_diacritic, pad_length, square_brackets]``
- ``pseudolocalize(s)`` - method that returns a new string where the transforms to the input string ``s`` have been applied.
``pseudol10nutil.transforms`` module
------------------------------------
The following transforms are currently available:
- ``transliterate_diacritic`` - Takes the input string and returns a copy with diacritics added e.g. ``Hello`` -> ``Ȟêĺĺø``.
- ``transliterate_circled`` - Takes the input string and returns a copy with circled versions of the letters e.g. ``Hello`` -> ``Ⓗⓔⓛⓛⓞ``
- ``transliterate_fullwidth`` - Takes the input string and returns a copy with the letters converted to their fullwidth counterparts e.g. ``Hello`` -> ``Hello``
- ``pad_length`` - Appends a series of characters to the end of the input string to increase the string length per `IBM Globalization Design Guideline A3: UI Expansion <https://www-01.ibm.com/software/globalization/guidelines/a3.html>`_.
- ``angle_brackets`` - Surrounds the input string with '《' and '》' characters.
- ``curly_brackets`` - Surrounds the input string with '❴' and '❵' characters.
- ``square_brackets`` - Surrounds the input string with '⟦' and '⟧' characters.
Format string support
---------------------
When performing pseudo-localization on a string, the process will skip performing pseudo-localization on format strings. Python style format strings (e.g. ``{foo}``) and printf style format strings (e.g. ``%s``) are supported. For example::
Input [1]: Source {source1} returned 0 rows.
Output [1]: '⟦Șøüȓċê {source1} ȓêťüȓñêđ 0 ȓøẁš.﹎ЍאdžᾏⅧ㈴㋹퓛ﺏ𝟘🚦﹎ЍאdžᾏⅧ㈴㋹⟧
Input [2]: Source %(source2)s returned 1 row.
Output [2]: ⟦Șøüȓċê %(source2)s ȓêťüȓñêđ 1 ȓøẁ.﹎ЍאdžᾏⅧ㈴㋹퓛ﺏ𝟘🚦﹎ЍאdžᾏⅧ㈴㋹퓛⟧
Input [3]: Source %s returned %d rows.
Output [3]: ⟦Șøüȓċê %s ȓêťüȓñêđ %d ȓøẁš.﹎ЍאdžᾏⅧ㈴㋹퓛ﺏ𝟘🚦﹎ЍאdžᾏⅧ㈴㋹퓛ﺏ⟧
Example usage
^^^^^^^^^^^^^
Python 3 example::
>>> from pseudol10nutil import PseudoL10nUtil
>>> util = PseudoL10nUtil()
>>> s = u"The quick brown fox jumps over the lazy dog."
>>> util.pseudolocalize(s)
'⟦Ťȟê ʠüıċǩ ƀȓøẁñ ƒøẋ ǰüɱƥš øṽêȓ ťȟê ĺàźÿ đøğ.﹎ЍאdžᾏⅧ㈴㋹퓛ﺏ𝟘🚦﹎ЍאdžᾏⅧ㈴㋹퓛ﺏ𝟘🚦﹎Ѝא⟧'
>>> import pseudolocalize.transforms
>>> util.transforms = [pseudol10nutil.transforms.transliterate_fullwidth, pseudol10nutil.transforms.curly_brackets]
>>> util.pseudolocalize(s)
'❴The quick brown fox jumps over the lazy dog.❵'
>>> util.transforms = [pseudol10nutil.transforms.transliterate_circled, pseudol10nutil.transforms.pad_length, pseudol10nutil.transforms.angle_brackets]
>>> util.pseudolocalize(s)
'《Ⓣⓗⓔ ⓠⓤⓘⓒⓚ ⓑⓡⓞⓦⓝ ⓕⓞⓧ ⓙⓤⓜⓟⓢ ⓞⓥⓔⓡ ⓣⓗⓔ ⓛⓐⓩⓨ ⓓⓞⓖ.﹎ЍאdžᾏⅧ㈴㋹퓛ﺏ𝟘🚦﹎ЍאdžᾏⅧ㈴㋹퓛ﺏ𝟘🚦﹎Ѝא》'
Example web app
---------------
There is an example web app in the ``examples/webapp/`` directory that provides a web UI and a REST endpoint for pseudo-localizing strings. This example is also available on Docker hub: `https://hub.docker.com/r/leonidessaguisagjr/pseudol10nutil/`_.
Once the docker container is running, the web UI could be accessed via the following URL:
`http://localhost:8080/pseudol10nutil/`_.
The REST endpoint could be accessed as follows::
>>> import pprint
>>> import requests
>>> strings = { "s1": "The quick brown {0} jumps over the lazy {1}.", }
>>> data = { "strings": strings }
>>> headers = { "Accept": "application/json", "Content-Type": "application/json" }
>>> api_url = "http://localhost:8080/pseudol10nutil/api/v1.0/pseudo"
>>> resp = requests.post(api_url, headers=headers, json=data)
>>> resp.status_code
200
>>> pprint.pprint(resp.json())
{'strings': {'s1': '⟦Ťȟê ʠüıċǩ ƀȓøẁñ {0} ǰüɱƥš øṽêȓ ťȟê ĺàźÿ '
'{1}.﹎ЍאdžᾏⅧ㈴㋹퓛ﺏ𝟘🚦﹎ЍאdžᾏⅧ㈴㋹퓛ﺏ𝟘🚦﹎Ѝא⟧'}}
``POFileUtil`` class
--------------------
Class for performing pseudo-localization on .po (Portable Object) message catalogs. Currently the class has a single method, ``pseudolocalizefile(input_file, output_file, input_encoding='UTF-8', output_encoding='UTF-8', overwrite_existing=True)``.
The default transforms will be applied to the strings in the input file. To override this behavior, create an instance of the ``PseudoL10nUtil`` class with the desired behavior and assign it to the ``l10nutil`` field prior to calling the ``pseudolocalizefile()`` method.
Example usage
^^^^^^^^^^^^^
Using pypy3::
>>>> from pseudol10nutil import POFileUtil
>>>> pofileutil = POFileUtil()
>>>> input_file = "./testdata/locales/helloworld.pot"
>>>> output_file = "./testdata/locales/eo/LC_MESSAGES/helloworld_pseudo.po"
>>>> pofileutil.pseudolocalizefile(input_file, output_file)
>>>> with open(input_file, mode="r") as fileobj:
.... for line in fileobj:
.... if line.startswith("msgstr"):
.... print(line)
....
msgstr ""
msgstr ""
msgstr ""
>>>> with open(output_file, mode="r") as fileobj:
.... for line in fileobj:
.... if line.startswith("msgstr"):
.... print(line)
....
msgstr ""
msgstr "⟦Ẃȟàť ıš ÿøüȓ ñàɱê?: ﹎ЍאdžᾏⅧ㈴㋹퓛ﺏ𝟘🚦﹎ЍאdžᾏⅧ㈴㋹⟧"
msgstr "⟦Ȟêĺĺø {0}!﹎ЍאdžᾏⅧ㈴㋹퓛ﺏ𝟘🚦﹎ЍאdžᾏⅧ㈴㋹⟧"
>>>> from pseudol10nutil import PseudoL10nUtil
>>>> util = PseudoL10nUtil()
>>>> import pseudol10nutil.transforms
>>>> util.transforms = [pseudol10nutil.transforms.transliterate_circled, pseudol10nutil.transforms.pad_length]
>>>> pofileutil.l10nutil = util
>>>> pofileutil.pseudolocalizefile(input_file, output_file)
>>>> with open(output_file, mode="r") as fileobj:
.... for line in fileobj:
.... if line.startswith("msgstr"):
.... print(line)
....
msgstr ""
msgstr "Ⓦⓗⓐⓣ ⓘⓢ ⓨⓞⓤⓡ ⓝⓐⓜⓔ?: ﹎ЍאdžᾏⅧ㈴㋹퓛ﺏ𝟘🚦﹎ЍאdžᾏⅧ㈴㋹"
msgstr "Ⓗⓔⓛⓛⓞ {0}!﹎ЍאdžᾏⅧ㈴㋹퓛ﺏ𝟘🚦﹎ЍאdžᾏⅧ㈴㋹"
>>>>
License
-------
This is released under an MIT license. See the ``LICENSE`` file in this repository for more information.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pseudol10nutil-0.1.dev4.tar.gz
(12.4 kB
view hashes)
Built Distribution
Close
Hashes for pseudol10nutil-0.1.dev4-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cafbfb7b61193253a5acad1d4dc642a76ba9d7f9aaee95dee5fb4e4b43a59395 |
|
MD5 | 527e40ae7691c23952e6e8214566c53d |
|
BLAKE2b-256 | 9c90ff860ba41fba11d105e08399725510506a4527f54b6b4bd2d1f54b54a71b |