Skip to main content

Standalone phonological feature systems for historical linguistics

Project description

merkmal

merkmal is a standalone Python package for manipulating phonological features. Zero runtime dependencies, Python 3.12+.

It provides:

  • bundled phonological feature datasets
  • pluggable feature systems (9 built-in)
  • feature geometry and distance functions (Clements & Hume 1995)
  • tonal geometry (Yip/Bao)
  • query and analysis helpers for graphemes and feature sets
  • UPA transcription support

Installation

Install from PyPI:

pip install merkmal

Development install:

git clone https://github.com/tresoldi/merkmal.git
cd merkmal
pip install -e ".[dev]"

Run checks:

ruff check .
mypy src
pytest -q

Quick start

import merkmal

# Built-in systems
print(merkmal.list_systems())
# ['descriptive', 'broad', 'distinctive', 'pbase-hc', 'pbase-jfh',
#  'pbase-spe', 'pbase-uftc', 'phoible', 'classfeat']

# Basic grapheme lookup
print(merkmal.get_features("p"))
# frozenset({'consonant', 'voiceless', 'bilabial', 'stop'})

# Predefined sound classes
print(merkmal.get_class_features("V"))
# frozenset({'vowel'})

# Distance
print(merkmal.distance("a", "e"))
print(merkmal.distance("p", "b", system="classfeat"))

Systems

System Type Features Distance
descriptive categorical articulatory geometry-weighted
broad categorical simplified geometry-weighted
distinctive privative Clements & Hume geometry-weighted
pbase-hc, -jfh, -spe, -uftc multi-state 4 theoretical families geometry-weighted
phoible binary 37 features Hamming
classfeat hybrid sound classes + continuous trained weights

All systems implement the same FeatureSystem protocol. Distances, queries, matrices, and natural class derivation work across all of them.

Working with systems

You can use the lazy default registry through top-level helpers, or work with a specific system object.

import merkmal

descriptive = merkmal.get_system("descriptive")
distinctive = merkmal.get_system("distinctive")
pbase = merkmal.get_system("pbase-hc")

print(descriptive.grapheme_to_features("a"))
print(distinctive.grapheme_to_features("a"))
print(pbase.grapheme_to_representation("a"))

Exact reverse lookup is available when a native representation maps directly to a known grapheme.

descriptive = merkmal.get_system("descriptive")

grapheme = descriptive.features_to_grapheme(
    frozenset({"consonant", "voiced", "bilabial", "stop"})
)
print(grapheme)
# 'b'

Feature queries

Use features_to_graphemes(...) to find all graphemes matching a feature set. Matching is partial by default.

import merkmal

vowels = merkmal.features_to_graphemes(frozenset({"vowel"}))
print(vowels[:10])

# Exact matching
features = merkmal.get_features("a")
print(merkmal.features_to_graphemes(features, exact=True))

Natural classes and matrices

import merkmal

# Shared features of a segment set
print(merkmal.derive_class_features(["p", "t", "k"]))
# frozenset({'consonant', 'voiceless', 'stop'})

# Minimal distinguishing matrix
matrix = merkmal.minimal_matrix(["t", "d", "s"])
print(merkmal.tabulate_matrix(matrix))
grapheme | continuant | voiced
---------+------------+-------
t        | False      | False
d        | False      | True
s        | True       | False

Distance

import merkmal

print(merkmal.distance("a", "e"))
print(merkmal.distance("a", "u"))
print(merkmal.distance("p", "b"))
print(merkmal.distance("t", "d", system="pbase-hc"))

You can also supply a precomputed nested dictionary:

precomputed = {"a": {"e": 1.5, "u": 2.0}, "p": {"b": 0.5}}
print(merkmal.distance("a", "e", precomputed=precomputed))

Multi-state systems (P-base)

P-base-derived systems expose multi-state values (+, -, n, ., o, x) through FeatureState.

import merkmal

rep = merkmal.get_representation("a", system="pbase-hc")
print(rep.values["syllabic"])
# FeatureState.POSITIVE

The bundled P-base table is derived, not verbatim. Duplicate rows with conflicting values have the conflicting cells downgraded to . (FeatureState.DOT). The P-base data retains its own attribution and license notice in src/merkmal/data/pbase/.

Custom datasets

from merkmal import create_registry, load_dataset

dataset = load_dataset(directory="my_feature_data")
registry = create_registry(dataset=dataset)
system = registry.get_system("descriptive")
print(system.grapheme_to_features("k"))

Expected files in my_feature_data/: sounds.tsv, classes.tsv, features.tsv.

Documentation

See the tutorials for worked examples covering phonological features, typology, historical linguistics, cognate detection, and UPA transcription.

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

merkmal-0.1.0.tar.gz (177.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

merkmal-0.1.0-py3-none-any.whl (170.4 kB view details)

Uploaded Python 3

File details

Details for the file merkmal-0.1.0.tar.gz.

File metadata

  • Download URL: merkmal-0.1.0.tar.gz
  • Upload date:
  • Size: 177.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for merkmal-0.1.0.tar.gz
Algorithm Hash digest
SHA256 23f05b530a89f6d0a340a7feedcc9b866c7ee4dbc10e758955f2fd652abbffc9
MD5 6d85aec4e946b8494d923b3b2fc2a8b4
BLAKE2b-256 c6a9fe18e262c32399276c2257527477a60c47eb8a7845de9dfaed6659fbea96

See more details on using hashes here.

File details

Details for the file merkmal-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: merkmal-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 170.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for merkmal-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 01f313e9ac8ab3795937e6fa0802c69920ef83b8ccc7f384cd73ea5c84752425
MD5 68fc10f4f3e79ec46544622cc89ab802
BLAKE2b-256 35cbc7659bc2764c06a1e50553144bc1e1d0435df452e6645d88375ccc4d41ea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page