Skip to main content

IPA tokeniser

Project description

A simple IPA tokeniser, as simple as in:

>>> from ipatok import tokenise
>>> tokenise('ˈtiːt͡ʃə')
['t', 'iː', 't͡ʃ', 'ə']
>>> tokenise('ʃːjeq͡χːʼjer')
['ʃː', 'j', 'e', 'q͡χːʼ', 'j', 'e', 'r']

api

tokenise(string, strict=True) takes an IPA string and returns a list of tokens. A token usually consists of a single letter together with its accompanying diacritics. If two letters are connected by a tie bar, they are also considered a single token. Except for length markers, suprasegmentals are excluded from the output. Whitespace is also ignored.

By default the function raises a ValueError if the string does not conform to the IPA spec (the 2015 revision). Invoking it with strict=False makes it accept some common replacements such as g and ɫ.

tokenize(string, strict=True) is an alias for tokenise.

installation

This is a standard Python 3 package without dependencies. It is offered at the Cheese Shop, so you can install it with pip:

pip install ipatok

or, alternatively, you can clone this repo (safe to delete afterwards) and do:

python setup.py test
python setup.py install

Of course, this could be happening within a virtualenv/venv as well.

similar projects

licence

MIT. Do as you please and praise the snake gods.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for ipatok, version 0.0.1
Filename, size File type Python version Upload date Hashes
Filename, size ipatok-0.0.1-py3-none-any.whl (8.7 kB) File type Wheel Python version py3 Upload date Hashes View hashes
Filename, size ipatok-0.0.1.tar.gz (5.7 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page