A simple IPA tokeniser, as simple as in:
>>> from ipatok import tokenise >>> tokenise('ˈtiːt͡ʃə') ['t', 'iː', 't͡ʃ', 'ə'] >>> tokenise('ʃːjeq͡χːʼjer') ['ʃː', 'j', 'e', 'q͡χːʼ', 'j', 'e', 'r']
tokenise(string, strict=True) takes an IPA string and returns a list of tokens. A token usually consists of a single letter together with its accompanying diacritics. If two letters are connected by a tie bar, they are also considered a single token. Except for length markers, suprasegmentals are excluded from the output. Whitespace is also ignored.
By default the function raises a ValueError if the string does not conform to the IPA spec (the 2015 revision). Invoking it with strict=False makes it accept some common replacements such as g and ɫ and non-standard characters such as ˀ.
tokenize(string, strict=True) is an alias for tokenise.
This is a standard Python 3 package without dependencies. It is offered at the Cheese Shop, so you can install it with pip:
pip install ipatok
or, alternatively, you can clone this repo (safe to delete afterwards) and do:
python setup.py test python setup.py install
Of course, this could be happening within a virtualenv/venv as well.
MIT. Do as you please and praise the snake gods.
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size ipatok-0.0.2-py3-none-any.whl (9.4 kB)||File type Wheel||Python version py3||Upload date||Hashes View hashes|
|Filename, size ipatok-0.0.2.tar.gz (6.3 kB)||File type Source||Python version None||Upload date||Hashes View hashes|