Skip to main content

simple linter that checks datasets for IPA errors and inconsistencies

Project description

Checks linguistic datasets for IPA errors and inconsistencies. Usage:

ipalint mydataset

This will either (1) print the IPA errors found in the dataset; (2) print nothing, meaning it found no errors; or (3) print an error message if it fails to read the file. In no case will the input file be modified.

The linter should be able to read any well-formed csv/tsv/tab dataset, assuming that there is an IPA data column. It also reads table-less lines and handles pipes; thus, even if you have a less common format like this one, you can still lint it by doing something like:

cat KSL.qlc | grep "^[[:digit:]]" | cut -f 6 | ipalint

optional arguments

--col COL specifies the column containing the IPA data; this can be either the column name or the column index (starting from 0). If this option is not set, ipalint will try to guess the column by looking at the column names.

--no-header treats the first row as data. The default is to treat the first row as header and not lint it.

--ignore-nfd ignores errors about an IPA string that are not in Unicode’s NFD normal form. With very few exceptions, IPA diacritics should be combining characters. However, in some situations this might be irrelevant for your purposes and you can ignore these errors.

--ignore-ws ignores errors about leading or trailing whitespace in IPA strings. If combined with the previous flag, ipalint will only report errors about symbols that are not part of the IPA chart.

--linewise outputs (line number, error message) tuples, one such tuple per line of output. The default is to output the set of errors and include the list of line numbers to the right of each error.

--no-lines only outputs the set of errors found in the data. Useful when you want a quick glimpse of what might be wrong. This flag is ignored if the previous one is set.

what is checked

  • Ensures that all the characters of the dataset’s IPA strings are in the IPA chart (the 2015 revision). The only accepted non-IPA character is space.

  • Ensures that the strings conform to Unicode’s Normalisation Form D (NFD).

  • Ensures that the strings do not start or end with unnecessary whitespace.

installation

This is a standard Python 3 package without dependencies. It is offered at the Cheese Shop, so you can install it through pip:

pip install ipalint

or, alternatively, you can clone this repo (safe to delete afterwards) and do:

python setup.py test
python setup.py install

Of course, this could be happening within a virtualenv/venv as well.

similar projects

  • ipapy checks and cleans IPA strings.

  • lingpy includes some tools for analysing IPA strings.

  • ipatok is a library for tokenising IPA strings.

licence

MIT. Do as you please and praise the snake gods.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ipalint-0.0.1.tar.gz (23.9 kB view details)

Uploaded Source

Built Distribution

ipalint-0.0.1-py3-none-any.whl (31.3 kB view details)

Uploaded Python 3

File details

Details for the file ipalint-0.0.1.tar.gz.

File metadata

  • Download URL: ipalint-0.0.1.tar.gz
  • Upload date:
  • Size: 23.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for ipalint-0.0.1.tar.gz
Algorithm Hash digest
SHA256 9d0c19ca107f7a53becceabac41ba6085a293729b3a3b15e55df30b7e77978ac
MD5 d45bcc82b66c75faafda164c761643c0
BLAKE2b-256 31f6b3039e9454073ee8f28f2739e0b41a810e2ed1d1f42a49452a5b87a27757

See more details on using hashes here.

File details

Details for the file ipalint-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for ipalint-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 66dfd8bd1a7af0afecfb268b33afb15c7c12489a0a4d44c33148346d757eb8ab
MD5 b93633f2fca51d083f4de28bc75f9b71
BLAKE2b-256 ddc605cae32575e9e59bf2f3edb4ccd9f5709ae6eb37ed0fff40068aa75285a8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page