Properties of IPA symbols for data analysis.
Project description
ipasymbols: Properties of IPA symbols for data analysis
A simple JSON database to lookup the properties of IPA symbols.
Warning: Under Development! (25.Nov.2021)
Version 0.0.*
is not ready to use. Non-Pulmonic consonants, affricates, co-articulated consonants, and dipthongs are not implemented yet. This kind of software is very prone to human errors, and required unit tests are not implemented so far.
Usage
Get lists of IPA phons
import ipasymbols
# all vowels
all_vowels = ipasymbols.phonlist(query={'type': 'vowel'})
# front vowels
front_vowels = ipasymbols.phonlist(query={'type': 'vowel', 'backness': 'front'})
# diphthongs (2 char vowels)
diphthongs = ipasymbols.phonlist(query={'type': 'diphthong'})
# different types of consonants
consonants = ipasymbols.phonlist(query={'type': ["pulmonic", "non-pulmonic"]})
# consonants = ['m̥', 'm', 'ɱ', 'n̼', ...]
Get properties of an IPA phon
import ipasymbols
phon = 'ɪ'
props = ipasymbols.props(phon=phon, keys=["height"])
# props = {'height': 'near-close'}
Count certain kinds of IPA symbols
import ipasymbols
ipatext = "de:ɐ̯ kɔʊd ɪst fɔl blø:t abɐ aʊ̯x tɔl"
# vowels
all_vowels = ipasymbols.count(ipatext, query={'type': 'vowel'})
# front vowels
front_vowels = ipasymbols.count(ipatext, query={'type': 'vowel', 'backness': 'front'})
# diphthongs (2 char vowels)
diphthongs = ipasymbols.count(ipatext, query={'type': 'diphthong'})
# different types of consonants
consonants = ipasymbols.count(ipatext, query={'type': ["pulmonic", "non-pulmonic"]})
Count consonant clusters
import ipasymbols
ipatext = "de:ɐ̯ kɔʊd ɪst fɔl blø:t abɐ aʊ̯x tɔl"
types = ["pulmonic", "non-pulmonic", "affricate", "co-articulated"]
clusters = ipasymbols.count_clusters(
ipatext, query={"type": types}, phonlen=3, min_cluster_len=2)
# clusters = {2: 789, 3: 654, 4: 123, ...}
Read the whole IPA symbols database
import ipasymbols
mydict = ipasymbols.db
Appendix
Installation
The ipasymbols
git repo is available as PyPi package
pip install ipasymbols
pip install git+ssh://git@github.com/ulf1/ipasymbols.git
Install a virtual environment
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt --no-cache-dir
pip install -r requirements-dev.txt --no-cache-dir
pip install -r requirements-demo.txt --no-cache-dir
(If your git repo is stored in a folder with whitespaces, then don't use the subfolder .venv
. Use an absolute path without whitespaces.)
Python commands
- Jupyter for the examples:
jupyter lab
- Check syntax:
flake8 --ignore=F401 --exclude=$(grep -v '^#' .gitignore | xargs | sed -e 's/ /,/g'),./ipasymbols/ipasymbols.py
- Run Unit Tests:
PYTHONPATH=. pytest
Publish
python setup.py sdist
twine upload -r pypi dist/*
Clean up
find . -type f -name "*.pyc" | xargs rm
find . -type d -name "__pycache__" | xargs rm -r
rm -r .pytest_cache
rm -r .venv
Support
Please open an issue for support.
Contributing
Please contribute using Github Flow. Create a branch, add commits, and open a pull request.
Acknowledgements
The "Evidence" project was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - 433249742 (GU 798/27-1; GE 1119/11-1).
Maintenance
- till 31.Aug.2023 (v0.0.1) the code repository was maintained within the DFG project 433249742
- since 01.Sep.2023 (v0.1.0) the code repository is maintained by Ulf Hamster.
Citation
You can cite the following paper if you want to use this repository in your research work.
@inproceedings{hamster-2022-everybody,
title = "Everybody likes short sentences - A Data Analysis for the Text Complexity {DE} Challenge 2022",
author = "Hamster, Ulf A.",
booktitle = "Proceedings of the GermEval 2022 Workshop on Text Complexity Assessment of German Text",
month = sep,
year = "2022",
address = "Potsdam, Germany",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.germeval-1.2",
pages = "10--14",
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file ipasymbols-0.1.0.tar.gz
.
File metadata
- Download URL: ipasymbols-0.1.0.tar.gz
- Upload date:
- Size: 17.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.9.6 requests/2.31.0 setuptools/59.6.0 requests-toolbelt/1.0.0 tqdm/4.65.0 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 416a5abe9ed62b51454fe108e2030db3c6bde90d62b3542f8f727dc2d8003ded |
|
MD5 | 919fc32311ec2091502a252c9864b6b7 |
|
BLAKE2b-256 | 6b5792dc11cd422a611a010b34647c10dc25ba2f610a6e18f91fdc3b4c31c217 |