Skip to main content

Tools for an algorithmic approach to phonology (some useful to computational phonology and morphology more broadly)

Project description

algophon

Code for working on computational phonology and morphology in Python.

This package is based on code developed by Caleb Belth during the course of his PhD; the title of his dissertation, Towards an Algorithmic Account of Phonological Rules and Representations, serves as the origin for the repository's name algophon.

The package is under active development! The PyPI distribution and documentation are updated as the project progresses. The package includes:

  1. Handy tools for working with strings of phonological segments.
  2. Implementations of computational learning models.

Suggestions are welcome!

Install

pip install algophon

Working With Strings of Segments

The code at the top level of the package provides some nice functionality for easily working with strings of phonological segments.

The following examples assume you have imported the appropriate classes:

>>> from algophon import Seg, SegInv, SegStr, NatClass

Segments: Seg

A class to represent a phonological segment.

You are unlikely to be creating Seg objects yourself very often. They will usually be constructed internally by other parts of the package (in particular, see SegInv and SegStr). However, if you ever need to, creating a Seg object requires the following arguments:

  • ipa: a str IPA symbol
  • features (optional): a dict of features mapping to their corresponding values
>>> seg = Seg(ipa='i', features={'syl': '+', 'voi': '+', 'stri': '0'})

What is important to know is how Seg objects behave, and why they are handy.

First, in the important respects Seg behaves like the str IPA segment used to create it.

If you print a Seg object, it will print its IPA:

>>> print(seg)
i

If you compare a Seg object to a str, it will behave like it is the IPA symbol:

>>> print(seg == 'i')
True
>>> print(seg == 'e')
False

A Seg object hashes to the same value as its IPA symbol:

>>> print(len({seg, 'i'}))
1
>>> print('i' in {seg}, seg in {'i'})
True True

Second, in the important respects Seg behaves like a feature bundle (see also the other classes, where other benefits will become clear).

>>> print(seg.features['syl'])
+

Third, Seg handles IPA symbols that are longer than one Unicode char.

>>> tsh = Seg(ipa='t͡ʃ')
>>> print(tsh)
t͡ʃ
>>> print(len(tsh))
1
>>> from algophon.symbols import LONG # see description of symbols below
>>> long_i = Seg(ipa=f'i{LONG}')
>>> print(long_i)

>>> print(len(long_i))
1

Segment Inventory: SegInv

A class to represent an inventory of phonological segments (Seg objects).

A SegInv object is a collection of Seg objects. A SegInv requires no arguments to construct, though it provides two optional arguments:

  • ipa_file_path: a str pointing to a file of segment-feature mappings.
  • sep: a str specifying the column separator of the ipa_file_path file.

By default, SegInv uses Panphon (Mortensen et. al., 2016) features. The optional parameters allow you to use your own features. The file at ipa_file_path must be formatted like this:

  • The first row must be a header of feature names, separated by the sep (by default, \t)
  • The first column must contain the segment IPAs (the header row can have anything, e.g., SEG)
  • The remaining columns (non-first row) must contain the feature values.

When a SegInv object is created, it is empty:

>>> seginv = SegInv()
>>> seginv
SegInv of size 0

You can add segments by the add, add_segments, and add_segments_by_str methods:

>>> seginv.add('i')
>>> print(seginv.segs)
{i}
>>> seginv.add_segs({'p', 'b', 't', 'd'})
>>> print(seginv.segs)
{b, t, d, i, p}
>>> seginv.add_segs_by_str('eː n t j ə') # segments in str must be space-separated
>>> print(seginv.segs)
{b, t, d, i, j, n, p, ə, eː}

The reason that add_segs_by_str requires the segments to be space-separated is because not all IPA symbols are only one char (e.g., 'eː'). Moreover, this is consistent with the Sigmorphon challenges data format commonly used in morphophonology tasks.

These add* methods automatically create Seg objects and assign them features based on either Panphon (default) or the ipa_file_path file.

>>> print(seginv['eː'].features)
{'syl': '+', 'son': '+', 'cons': '-', 'cont': '+', 'delrel': '-', 'lat': '-', 'nas': '-', 'strid': '0', 'voi': '+', 'sg': '-', 'cg': '-', 'ant': '0', 'cor': '-', 'distr': '0', 'lab': '-', 'hi': '-', 'lo': '-', 'back': '-', 'round': '-', 'velaric': '-', 'tense': '+', 'long': '+', 'hitone': '0', 'hireg': '0'}

This also demonstrates that seginv operates like a dictionary in that you can retrieve and check the existence of segments by their IPA.

>>> 'eː' in seginv
True

Strings of Segments: SegStr

A class to represent a sequence of phonological segments (Seg objects).

The class SegStr allows for handling several tricky aspects of IPA sequences. It is common practice to represent strings of IPA sequences in a space-separated fashion such that, for example, [eːntjə] is represented 'eː n t j ə'.

Creating a SegStr object requires the following arguments:

  • segs: a collection of segments, which can be in any of the following formats:
    • str of IPA symbols, where each symbol is separated by a space ' ' (most common)
    • list of IPA symbols
    • list of Seg objects
  • seginv: a SegInv object
>>> seginv = SegInv() # init SegInv
>>> seq = SegStr('eː n t j ə', seginv)
>>> print(seq)
eːntjə

Creating the SegStr object automatically adds the segments in the object to the SegInv object.

>>> print(seginv.segs)
{ə, t, n, j, eː}

For clean visualization, SegStr displays the sequence of segments without spaces, as print(seq) shows above. But internally a SegStr object knows what the segments are:

>>> print(len(seq))
5
>>> seq[0]

>>> type(seq[0]) # indexing returns a Seg object
<class 'algophon.seg.Seg'>
>>> seq[-2:]

>>> type(seq[-2:]) # slicing returns a new SegStr object
<class 'algophon.segstr.SegStr'>
>>> seq[-2:] == 'j ə' # comparison to str objects works as expected
True
>>> seq[-2:] == 'ə n'
False

SegStr also implements equivalents of useful str methods.

>>> seq.endswith('j ə')
True
>>> dim_sufx = seq[-2:]
>>> seq.endswith(dim_sufx)
True
>>> seq.startswith(seq[:-2])
True

A SegStr object hashes to the value of its (space-separated) string:

>>> len({seq, 'eː n t j ə'})
1
>>> seq in {'eː n t j ə'}
True

Natural Class: NatClass

A class to represent a Natural class, in the sense of sets of segments represented intensionally as conjunctions of features.

>>> son = NatClass(feats={'+son'}, seginv=seginv)
>>> son
[+son]
>>> 'ə' in son
True
>>> 'n' in son
True
>>> 't' in son
False

The class also allows you to get the natural class's extension and the extension's complement, relative to the SegInv (in our example, only {ə, t, n, j, eː} are in seginv):

>>> son.extension()
{eː, j, ə, n}
>>> son.extension_complement()
{t}

You can also retrieve an extension (complement) directly from a SegInv object without creating a NatClass obj:

>>> seginv.extension({'+syl'})
{ə, eː}
>>> seginv.extension_complement({'+syl'})
{j, t, n}

Symbols: The symbols module

The symbols module (technically just a file...) contains a number of constant variables that store some useful symbols:

LWB = '⋊'
RWB = '⋉'
SYLB = '.'
MORPHB = '-'
BOUNDARIES = [LWB, RWB, SYLB, MORPHB]
PRIMARY_STRESS = 'ˈ'
SEC_STRESS = 'ˌ'
LONG = 'ː'
NASALIZED = '\u0303'  # ◌̃
UNDERSPECIFIED = '0'
UNK = '?'
NEG = '¬'
EMPTY = '_'
FUNCTION_COMPOSITION = '∘'

These can be accessed like this:

>>> from algophon.symbols import *
>>> NASALIZED
'̃'
>>> f'i{LONG}'

Learning Models

D2L

An implementation of the model "Distant to Local" from the following paper:

@article{belth2024tiers,
    title={A Learning-Based Account of Phonological Tiers},
    author={Belth, Caleb},
    journal={Linguistic Inquiry},
    year={2024},
    publisher={MIT Press},
    url = {https://doi.org/10.1162/ling\_a\_00530},
}

Please see the models README for details.

PLP

Work in Progress

Mɪᴀꜱᴇɢ

An implementation of the model "Meaning Informed Segmentation of Agglutinative Morphology" (Mɪᴀꜱᴇɢ) from the following paper:

@inproceedings{belth2024miaseg,
  title={Meaning-Informed Low-Resource Segmentation of Agglutinative Morphology},
  author={Belth, Caleb},
  booktitle={Proceedings of the Society for Computation in Linguistics},
  year={2024}
}

Please see the models README for details.

Other Models

Work in Progress

Citation

If you use this package in your research, you can use the following citation:

@phdthesis{belth2023towards,
  title={{Towards an Algorithmic Account of Phonological Rules and Representations}},
  author={Belth, Caleb},
  year={2023},
  school={{University of Michigan}}
}

If you use one of the computational models, please cite the corresponding paper(s).

References

  • Mortensen, D. R., Littell, P., Bharadwaj, A., Goyal, K., Dyer, C., & Levin, L. (2016, December). Panphon: A resource for mapping IPA segments to articulatory feature vectors. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (pp. 3475-3484).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

algophon-0.1.5.tar.gz (78.5 kB view details)

Uploaded Source

Built Distribution

algophon-0.1.5-py3-none-any.whl (70.0 kB view details)

Uploaded Python 3

File details

Details for the file algophon-0.1.5.tar.gz.

File metadata

  • Download URL: algophon-0.1.5.tar.gz
  • Upload date:
  • Size: 78.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.5

File hashes

Hashes for algophon-0.1.5.tar.gz
Algorithm Hash digest
SHA256 5b0ad4168976d73f2392e4bebe1e81a05ebe36c836c49594d965346280963285
MD5 6666c4b9aeb2c42e5671c436d8f961ff
BLAKE2b-256 d0e2be5ea4f2c2d90f72858aed7f40d1f31c157994014cddba6864f2492a70f3

See more details on using hashes here.

File details

Details for the file algophon-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: algophon-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 70.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.5

File hashes

Hashes for algophon-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 022d8b8ed435a625cc54fff082c84b65edbf13c745214a8f57fb714ee11edfab
MD5 7e780bab53b17a8e193a09c6cd18fe81
BLAKE2b-256 c85e23e52f5e59927f9ec13573c45a7fd0b65aa9884a1e9f43dec0c294e5667b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page