Skip to main content

LAPA allows for converting digitised early modern Dutch theatre plays into (presumed) phonetic script (SAMPA). To achieve this, a ruleset has been created that codifies the transliteration to SAMPA. This codebase contains parsers for the rule sets (xls format), parsers for the digitised texts (naf xml) and logic to perform counts and correlations.

Project description

LAPA: Language Pattern Analyser

A Digital Tool for the Analysis of Patterns in Spelled Language Sounds in Historical Dutch Theatre Plays.

LAPA allows for converting digitised early modern Dutch theatre plays into (presumed) phonetic script (SAMPA). To achieve this, a ruleset has been created that codifies the transliteration to SAMPA. This codebase contains parsers for the rule sets (xls format), parsers for the digitised texts (naf xml) and logic to perform counts and correlations.

The motivation for this project can be found in the following publication:

Smitskamp, Fieke. (2024). From Ah! to Little Z: Clustering Spelled Language Sounds in Early Modern Dutch Theatre Plays (1570-1800). BMGN - Low Countries Historical Review, 139, 7-31. https://doi.org/10.51769/bmgn-lchr.13868

To test the interactive notebooks, you can simply click the badge below:

Binder

or open this URL in your browser:

https://mybinder.org/v2/gh/kws/lapa-analysis/HEAD?urlpath=lab/tree/notebooks/index.ipynb

Installation

This project uses Poetry for dependency management. To get started:

  1. Install Poetry (if you haven't already):

    curl -sSL https://install.python-poetry.org | python3 -
    
  2. Clone the repository and install dependencies:

    git clone <repository-url>
    cd lapa-analysis
    poetry install
    
  3. Activate the virtual environment:

    poetry shell
    

Alternatively, you can run commands directly using poetry run:

poetry run lapa-ng --help

File Structure

.
├── fixtures/             # Sample files for testing and CLI runs
├── lapa_classic/         # Core business logic for parsing and processing   ├── counter.py        # Classes to count emotions and sampa characters   ├── sampify.py        # Classes to parse and load the sampa transliteration dictionary   └── naf.py            # Classes to parse the naf xml file
│
├── tests/                # Test suite   └── test_classic.py   # Tests for lapa_classic functionality
│
└── lapa_ng/             # Next generation version of LAPA (under development)

Code Documentation

The code documentation can be found in the docs directory or browsed on https://kws.github.io/lapa-analysis/.

Usage

LAPA-NG provides a command-line interface for common operations. The system uses a factory pattern to create different types of matchers based on a specification string.

Matcher Specification

The matcher specification follows the format:

[prefix:][filename[#sheet]][?options]

Where:

  • prefix: Optional prefix indicating the type of matcher ('ng' or 'classic')
  • filename: Path to the rules file (Excel or YAML)
  • sheet: Optional sheet name for Excel files
  • options: Optional query string parameters (e.g., ?sort=numeric)

Available options for the 'ng' prefix:

  • sort: Rule sorting method ('numeric' or 'alpha')
    • numeric: Sort rules by numeric priority (default)
    • alpha: Sort rules alphabetically by letter and priority

Examples:

# Next-gen matcher with specific sheet and numeric sorting (default)
lapa-ng translate-words 'ng:rules.xlsx#RULES' word1 word2

# Next-gen matcher with alpha sorting
lapa-ng translate-words 'ng:rules.xlsx#RULES?sort=alpha' word1 word2

# Classic matcher, default sheet
lapa-ng translate-words 'classic:rules.xlsx' word1 word2

# Next-gen matcher (default prefix)
lapa-ng translate-words 'rules.xlsx#RULES' word1 word2

Transcribing Words

Transcribe one or more words using the specified rules:

# Using numeric sorting (default)
lapa-ng translate-words 'rules.xlsx#RULES' word1 word2 word3

# Using alpha sorting
lapa-ng translate-words 'rules.xlsx#RULES?sort=alpha' word1 word2 word3

This will output the phonetic transcription in SAMPA format for each word.

Processing NAF Files

Process text from NAF (NLP Annotation Framework) files:

# Using numeric sorting (default)
lapa-ng translate-naf 'rules.xlsx#RULES' input.naf

# Using alpha sorting
lapa-ng translate-naf 'rules.xlsx#RULES?sort=alpha' input.naf

This will:

  1. Read the NAF file
  2. Apply the specified rules
  3. Output a CSV file with detailed transcription information

Converting Rules

Convert Excel-based rules to YAML format:

lapa-ng convert-excel rules.xlsx rules.yaml --sheet RULES

Getting Help

You can get help for any command by adding --help:

lapa-ng --help
lapa-ng translate-words --help
lapa-ng translate-naf --help
lapa-ng convert-excel --help

Sample Files

Sample NAF XML and rules files are provided in the fixtures directory for testing and reference.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lapa_ng-0.1.0.tar.gz (34.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lapa_ng-0.1.0-py3-none-any.whl (41.6 kB view details)

Uploaded Python 3

File details

Details for the file lapa_ng-0.1.0.tar.gz.

File metadata

  • Download URL: lapa_ng-0.1.0.tar.gz
  • Upload date:
  • Size: 34.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.10.6 Darwin/24.4.0

File hashes

Hashes for lapa_ng-0.1.0.tar.gz
Algorithm Hash digest
SHA256 cd387701723f0f016f5e4f94534cb9a5129b2b9c52e172f7341b0ddf2878858f
MD5 9d8a05ac32f1f603bf8c79548ad1ec6a
BLAKE2b-256 c1fe82de85446ff2c12a3f4a90a931f4fe6ce24fe1b265433f0d1ce23f2c4021

See more details on using hashes here.

File details

Details for the file lapa_ng-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: lapa_ng-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 41.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.10.6 Darwin/24.4.0

File hashes

Hashes for lapa_ng-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8261148252b2f044062f7d803484d1abc1bf1b2b16801202832238fe2c645ae5
MD5 01fc325842955eba3dc5bf1dc0d5e6e4
BLAKE2b-256 064f07f152ea6c84f41a1b54f91d08cd09bd4c1f86add976753cd4d8ccb0e128

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page