Skip to main content

Standalone phonological feature systems for historical linguistics

Project description

merkmal

merkmal is a standalone Python package for manipulating phonological features. Zero runtime dependencies, Python 3.12+.

It provides:

  • bundled phonological feature datasets
  • pluggable feature systems (9 built-in)
  • feature geometry and distance functions (Clements & Hume 1995)
  • tonal geometry (Yip/Bao)
  • query and analysis helpers for graphemes and feature sets
  • UPA transcription support

Installation

Install from PyPI:

pip install merkmal

Development install:

git clone https://github.com/tresoldi/merkmal.git
cd merkmal
pip install -e ".[dev]"

Run checks:

ruff check .
mypy src
pytest -q

Quick start

import merkmal

# Built-in systems
print(merkmal.list_systems())
# ['descriptive', 'broad', 'distinctive', 'pbase-hc', 'pbase-jfh',
#  'pbase-spe', 'pbase-uftc', 'phoible', 'classfeat']

# Basic grapheme lookup
print(merkmal.get_features("p"))
# frozenset({'consonant', 'voiceless', 'bilabial', 'stop'})

# Predefined sound classes
print(merkmal.get_class_features("V"))
# frozenset({'vowel'})

# Distance
print(merkmal.distance("a", "e"))
print(merkmal.distance("p", "b", system="classfeat"))

Systems

System Type Features Distance
descriptive categorical articulatory geometry-weighted
broad categorical simplified geometry-weighted
distinctive privative Clements & Hume geometry-weighted
pbase-hc, -jfh, -spe, -uftc multi-state 4 theoretical families geometry-weighted
phoible binary 37 features Hamming
classfeat hybrid sound classes + continuous trained weights

All systems implement the same FeatureSystem protocol. Distances, queries, matrices, and natural class derivation work across all of them.

Working with systems

You can use the lazy default registry through top-level helpers, or work with a specific system object.

import merkmal

descriptive = merkmal.get_system("descriptive")
distinctive = merkmal.get_system("distinctive")
pbase = merkmal.get_system("pbase-hc")

print(descriptive.grapheme_to_features("a"))
print(distinctive.grapheme_to_features("a"))
print(pbase.grapheme_to_representation("a"))

Exact reverse lookup is available when a native representation maps directly to a known grapheme.

descriptive = merkmal.get_system("descriptive")

grapheme = descriptive.features_to_grapheme(
    frozenset({"consonant", "voiced", "bilabial", "stop"})
)
print(grapheme)
# 'b'

Feature queries

Use features_to_graphemes(...) to find all graphemes matching a feature set. Matching is partial by default.

import merkmal

vowels = merkmal.features_to_graphemes(frozenset({"vowel"}))
print(vowels[:10])

# Exact matching
features = merkmal.get_features("a")
print(merkmal.features_to_graphemes(features, exact=True))

Natural classes and matrices

import merkmal

# Shared features of a segment set
print(merkmal.derive_class_features(["p", "t", "k"]))
# frozenset({'consonant', 'voiceless', 'stop'})

# Minimal distinguishing matrix
matrix = merkmal.minimal_matrix(["t", "d", "s"])
print(merkmal.tabulate_matrix(matrix))
grapheme | continuant | voiced
---------+------------+-------
t        | False      | False
d        | False      | True
s        | True       | False

Distance

import merkmal

print(merkmal.distance("a", "e"))
print(merkmal.distance("a", "u"))
print(merkmal.distance("p", "b"))
print(merkmal.distance("t", "d", system="pbase-hc"))

You can also supply a precomputed nested dictionary:

precomputed = {"a": {"e": 1.5, "u": 2.0}, "p": {"b": 0.5}}
print(merkmal.distance("a", "e", precomputed=precomputed))

Multi-state systems (P-base)

P-base-derived systems expose multi-state values (+, -, n, ., o, x) through FeatureState.

import merkmal

rep = merkmal.get_representation("a", system="pbase-hc")
print(rep.values["syllabic"])
# FeatureState.POSITIVE

The bundled P-base table is derived, not verbatim. Duplicate rows with conflicting values have the conflicting cells downgraded to . (FeatureState.DOT). The P-base data retains its own attribution and license notice in src/merkmal/data/pbase/.

Custom datasets

from merkmal import create_registry, load_dataset

dataset = load_dataset(directory="my_feature_data")
registry = create_registry(dataset=dataset)
system = registry.get_system("descriptive")
print(system.grapheme_to_features("k"))

Expected files in my_feature_data/: sounds.tsv, classes.tsv, features.tsv.

Documentation

See the tutorials for worked examples covering phonological features, typology, historical linguistics, cognate detection, and UPA transcription.

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

merkmal-0.1.1.tar.gz (178.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

merkmal-0.1.1-py3-none-any.whl (170.7 kB view details)

Uploaded Python 3

File details

Details for the file merkmal-0.1.1.tar.gz.

File metadata

  • Download URL: merkmal-0.1.1.tar.gz
  • Upload date:
  • Size: 178.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for merkmal-0.1.1.tar.gz
Algorithm Hash digest
SHA256 e1ba9aa9bd537ab7cc4d71c0642c18dd2909ba25a7e3316838137fafb8f2f084
MD5 07302b76d3163c268139f64d7ef46e4b
BLAKE2b-256 3a977b916968e259da7a6f2f3e669d26b3fbac25452f42d0f5de791583f8f453

See more details on using hashes here.

File details

Details for the file merkmal-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: merkmal-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 170.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for merkmal-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 58a3f55f02ec8434e59ae3d07629b0e43a3458a80619171d755b022697b0cdbe
MD5 0b6323afec54c6481edce8a7fce5c1aa
BLAKE2b-256 cef0f8640e9daf0422ecd8bee533136d6d509caaf6d9f242e4512a5b869e10ae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page