Tools for an algorithmic approach to phonology (some useful to computational phonology and morphology more broadly)
Project description
algophon
Code for working on computational phonology and morphology in Python.
The project is based on code developed by Caleb Belth during the course of his PhD; the title of his dissertation, Towards an Algorithmic Account of Phonological Rules and Representations, serves as the origin for the repository's name algophon.
This is a work in progress. The pypi distribution and documentation will be updated as the project progresses! The initial plan for the project is to include:
- Handy tools for working with strings of phonological segments.
- Implementations of computational learning models.
Item (1) will be implemented first.
Suggestions are welcome!
Install
pip install algophon
Working With Strings of Segments
The code at the top level of the package provides some nice functionality for easily working with strings of phonological segments.
The following examples assume you have imported the appropriate classes:
>>> from algophon import Seg, SegInv, SegStr, NatClass
Segments: Seg
A class to represent a phonological segment.
You are unlikely to be creating Seg
objects yourself very often. They will usually be constructed internally by other parts of the package (in particular, see SegInv
and SegStr
). However, if you ever need to, creating a Seg
object requires the following arguments:
ipa
: astr
IPA symbolfeatures
(optional): adict
of features mapping to their corresponding values
>>> seg = Seg(ipa='i', features={'syl': '+', 'voi': '+', 'stri': '0'})
What is important to know is how Seg
objects behave, and why they are handy.
First, in the important respects Seg
behaves like the str
IPA segment used to create it.
If you print
a Seg
object, it will print its IPA:
>>> print(seg)
i
If you compare a Seg
object to a str
, it will behave like it is the IPA symbol:
>>> print(seg == 'i')
True
>>> print(seg == 'e')
False
A Seg
object hashes to the same value as its IPA symbol:
>>> print(len({seg, 'i'}))
1
>>> print('i' in {seg}, seg in {'i'})
True True
Second, in the important respects Seg
behaves like a feature bundle (see also the other classes, where other benefits will become clear).
>>> print(seg.features['syl'])
+
Third, Seg
handles IPA symbols that are longer than one unicode char.
>>> tsh = Seg(ipa='t͡ʃ')
>>> print(tsh)
t͡ʃ
>>> print(len(tsh))
1
>>> from algophon.symbols import LONG # see description of symbols below
>>> long_i = Seg(ipa=f'i{LONG}')
>>> print(long_i)
iː
>> print(len(long_i))
1
Segment Inventory: SegInv
A class to represent an inventory of phonological segments (Seg objects).
A SegInv
object is a collection of Seg
objects. A SegInv
requires no arguments to construct, though it provides two optional arguments:
ipa_file_path
: astr
pointing to a file of segment-feature mappings.sep
: astr
specifying the column separator of theipa_file_path
file.
By default, SegInv
uses Panphon (Mortensen et. al., 2016) features. The optional parameters allow you to use your own features. The file at ipa_file_path
must be formatted like this:
- The first row must be a header of feature names, separated by the
sep
(by default,\t
) - The first column must contain the segment IPAs (the header row can have anything, e.g.,
SEG
) - The remaining columns (non first row) must contain the feature values.
When a SegInv
object is created, it is empty:
>>> seginv = SegInv()
>>> seginv
SegInv of size 0
You can add segments by the add
, add_segments
, and add_segments_by_str
methods:
>>> seginv.add('i')
>>> print(seginv.segs)
{i}
>>> seginv.add_segs({'p', 'b', 't', 'd'})
>>> print(seginv.segs)
{b, t, d, i, p}
>>> seginv.add_segs_by_str('eː n t j ə') # segments in str must be space-separated
>>> print(seginv.segs)
{b, t, d, i, j, n, p, ə, eː}
The reason that add_segs_by_str
requires the segments be space-separated is because not all IPA symbols are only one char (e.g., 'eː'
). Moreover, this is consistent with the Sigmorphon challenges data format commonly used in morphophonology tasks.
These add*
methods automatically create Seg
objects and assign them features
based on either Panphon (default) or the ipa_file_path
file.
>>> print(seginv['eː'].features)
{'syl': '+', 'son': '+', 'cons': '-', 'cont': '+', 'delrel': '-', 'lat': '-', 'nas': '-', 'strid': '0', 'voi': '+', 'sg': '-', 'cg': '-', 'ant': '0', 'cor': '-', 'distr': '0', 'lab': '-', 'hi': '-', 'lo': '-', 'back': '-', 'round': '-', 'velaric': '-', 'tense': '+', 'long': '+', 'hitone': '0', 'hireg': '0'}
This also demonstrates that seginv
operates like a dictionary in that you can retrieve and check the existence of segments by their IPA.
>>> 'eː' in seginv
True
Strings of Segments: SegStr
A class to represent a sequence of phonological segments (Seg objects).
Natural Class: NatClass
A class to represent a Natural class, in the sense of sets of segments represented intensionally as conjunctions of features.
Symbols: The symbols
module
The symbols
module (techincally just a file...) contains a number of constant variables that store some useful symbols:
LWB = '⋊'
RWB = '⋉'
SYL_BOUNDARY = '.'
PRIMARY_STRESS = 'ˈ'
SEC_STRESS = 'ˌ'
LONG = 'ː'
NASALIZED = '\u0303' # ◌̃
UNDERSPECIFIED = '0'
UNK = '?'
NEG = '¬'
These can be accessed like this:
>>> from algophon.symbols import *
>>> NASALIZED
'̃'
>>> f'i{LONG}'
iː
Learning Models
Work in Progress
Citation
If you use this package in your research, you can use the following citation:
@phdthesis{belth2023towards,
title={{Towards an Algorithmic Account of Phonological Rules and Representations}},
author={Belth, Caleb},
year={2023},
school={{University of Michigan}}
}
References
- Mortensen, D. R., Littell, P., Bharadwaj, A., Goyal, K., Dyer, C., & Levin, L. (2016, December). Panphon: A resource for mapping IPA segments to articulatory feature vectors. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (pp. 3475-3484).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.