Standalone phonological feature systems for historical linguistics
Project description
merkmal
merkmal is a standalone Python package for manipulating phonological
features. Zero runtime dependencies, Python 3.12+.
It provides:
- bundled phonological feature datasets
- pluggable feature systems (9 built-in)
- feature geometry and distance functions (Clements & Hume 1995)
- tonal geometry (Yip/Bao)
- query and analysis helpers for graphemes and feature sets
- UPA transcription support
Installation
Install from PyPI:
pip install merkmal
Development install:
git clone https://github.com/tresoldi/merkmal.git
cd merkmal
pip install -e ".[dev]"
Run checks:
ruff check .
mypy src
pytest -q
Quick start
import merkmal
# Built-in systems
print(merkmal.list_systems())
# ['descriptive', 'broad', 'distinctive', 'pbase-hc', 'pbase-jfh',
# 'pbase-spe', 'pbase-uftc', 'phoible', 'classfeat']
# Basic grapheme lookup
print(merkmal.get_features("p"))
# frozenset({'consonant', 'voiceless', 'bilabial', 'stop'})
# Predefined sound classes
print(merkmal.get_class_features("V"))
# frozenset({'vowel'})
# Distance
print(merkmal.distance("a", "e"))
print(merkmal.distance("p", "b", system="classfeat"))
Systems
| System | Type | Features | Distance |
|---|---|---|---|
descriptive |
categorical | articulatory | geometry-weighted |
broad |
categorical | simplified | geometry-weighted |
distinctive |
privative | Clements & Hume | geometry-weighted |
pbase-hc, -jfh, -spe, -uftc |
multi-state | 4 theoretical families | geometry-weighted |
phoible |
binary | 37 features | Hamming |
classfeat |
hybrid | sound classes + continuous | trained weights |
All systems implement the same FeatureSystem protocol. Distances, queries,
matrices, and natural class derivation work across all of them.
Working with systems
You can use the lazy default registry through top-level helpers, or work with a specific system object.
import merkmal
descriptive = merkmal.get_system("descriptive")
distinctive = merkmal.get_system("distinctive")
pbase = merkmal.get_system("pbase-hc")
print(descriptive.grapheme_to_features("a"))
print(distinctive.grapheme_to_features("a"))
print(pbase.grapheme_to_representation("a"))
Exact reverse lookup is available when a native representation maps directly to a known grapheme.
descriptive = merkmal.get_system("descriptive")
grapheme = descriptive.features_to_grapheme(
frozenset({"consonant", "voiced", "bilabial", "stop"})
)
print(grapheme)
# 'b'
Feature queries
Use features_to_graphemes(...) to find all graphemes matching a feature set.
Matching is partial by default.
import merkmal
vowels = merkmal.features_to_graphemes(frozenset({"vowel"}))
print(vowels[:10])
# Exact matching
features = merkmal.get_features("a")
print(merkmal.features_to_graphemes(features, exact=True))
Natural classes and matrices
import merkmal
# Shared features of a segment set
print(merkmal.derive_class_features(["p", "t", "k"]))
# frozenset({'consonant', 'voiceless', 'stop'})
# Minimal distinguishing matrix
matrix = merkmal.minimal_matrix(["t", "d", "s"])
print(merkmal.tabulate_matrix(matrix))
grapheme | continuant | voiced
---------+------------+-------
t | False | False
d | False | True
s | True | False
Distance
import merkmal
print(merkmal.distance("a", "e"))
print(merkmal.distance("a", "u"))
print(merkmal.distance("p", "b"))
print(merkmal.distance("t", "d", system="pbase-hc"))
You can also supply a precomputed nested dictionary:
precomputed = {"a": {"e": 1.5, "u": 2.0}, "p": {"b": 0.5}}
print(merkmal.distance("a", "e", precomputed=precomputed))
Multi-state systems (P-base)
P-base-derived systems expose multi-state values (+, -, n, ., o, x)
through FeatureState.
import merkmal
rep = merkmal.get_representation("a", system="pbase-hc")
print(rep.values["syllabic"])
# FeatureState.POSITIVE
The bundled P-base table is derived, not verbatim. Duplicate rows with
conflicting values have the conflicting cells downgraded to .
(FeatureState.DOT). The P-base data retains its own attribution and license
notice in src/merkmal/data/pbase/.
Custom datasets
from merkmal import create_registry, load_dataset
dataset = load_dataset(directory="my_feature_data")
registry = create_registry(dataset=dataset)
system = registry.get_system("descriptive")
print(system.grapheme_to_features("k"))
Expected files in my_feature_data/: sounds.tsv, classes.tsv, features.tsv.
Documentation
See the tutorials for worked examples covering phonological features, typology, historical linguistics, cognate detection, and UPA transcription.
License
MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file merkmal-0.1.0.tar.gz.
File metadata
- Download URL: merkmal-0.1.0.tar.gz
- Upload date:
- Size: 177.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
23f05b530a89f6d0a340a7feedcc9b866c7ee4dbc10e758955f2fd652abbffc9
|
|
| MD5 |
6d85aec4e946b8494d923b3b2fc2a8b4
|
|
| BLAKE2b-256 |
c6a9fe18e262c32399276c2257527477a60c47eb8a7845de9dfaed6659fbea96
|
File details
Details for the file merkmal-0.1.0-py3-none-any.whl.
File metadata
- Download URL: merkmal-0.1.0-py3-none-any.whl
- Upload date:
- Size: 170.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
01f313e9ac8ab3795937e6fa0802c69920ef83b8ccc7f384cd73ea5c84752425
|
|
| MD5 |
68fc10f4f3e79ec46544622cc89ab802
|
|
| BLAKE2b-256 |
35cbc7659bc2764c06a1e50553144bc1e1d0435df452e6645d88375ccc4d41ea
|