Skip to main content

Textual Reuse, Alignment, and Collation Engine — pairwise philological alignment with pluggable language packs

Project description

TRACE

Textual Reuse, Alignment, and Collation Engine — a Python library for pairwise philological alignment with pluggable language packs.

CI PyPI version Python versions License: MIT Documentation Status DOI

TRACE is designed for textual criticism, manuscript witness comparison, and the creation of digital synopses and critical editions. The core is language-agnostic; the first shipped language pack covers Biblical and Rabbinic Hebrew (hbo).


Highlights

  • Tokenizer pipeline with editorial-marker awareness ([reconstructed], ⟦deletion⟧, 〈insertion〉, (expanded), lacunae).
  • Tiered scoring returning (score, reason) per token pair — EXACT, NIQQUD_STRIPPED, PLENE_DEFECTIVE, ABBREVIATION, ORTHOGRAPHIC, INSERTION, OMISSION, NO_MATCH.
  • Semi-global Needleman–Wunsch with affine gap penalties (Gotoh) and a multi-token abbreviation lookahead (ר"ירבי ישמעאל).
  • Hebrew language pack with niqqud strip, plene/defective skeleton matching, gershayim/maqqef tokenizer hooks, and a seed lexicon of rabbinic abbreviations (extendable via Lexica.merge()).
  • I/O for plain text, JSON (round-trip), eScriptorium exports (with bbox + line metadata), and TEI XML (<tei:w> mode + flow-text fallback).
  • Reproducible — every AlignmentResult carries trace_version and language_pack_version in its params.

Installation

pip install tracealign

Requires Python 3.10+. Pulls pydantic, numpy, lxml, and rapidfuzz.

Quick start

import tracealign

w1 = tracealign.tokenize("שלום עולם רַבִּי דויד ר\"י אמר", lang="hbo", seq_label="W1")
w2 = tracealign.tokenize("שלום עולם רבי דוד רבי ישמעאל אמר", lang="hbo", seq_label="W2")

result = tracealign.align(w1, w2, lang="hbo")

print(f"total score: {result.total_score:.2f}")
print(f"summary: {dict(result.summary)}")
for m in result.matches:
    a = m.token_a.text if m.token_a else "—"
    b = m.token_b.text if m.token_b else "—"
    print(f"  {a:>10}{b:<10}  {m.reason.value:<18} {m.score:.2f}")

Output (abridged):

total score: 0.91
summary: {EXACT: 3, NIQQUD_STRIPPED: 1, PLENE_DEFECTIVE: 1, ABBREVIATION: 1}
       שלום ↔ שלום        exact              1.00
       עולם ↔ עולם        exact              1.00
      רַבִּי ↔ רבי         niqqud_stripped    0.95
       דויד ↔ דוד          plene_defective    0.85
        ר"י ↔ רבי          abbreviation       0.85   (primary)
        ר"י ↔ ישמעאל       abbreviation       0.00   (continuation)
        אמר ↔ אמר          exact              1.00

See the documentation for installation details, the full API, FAQs, and the design rationale.

Documentation

Section What it covers
Installation pip / from source / dev setup
Usage Tokenize, align, work with the result, custom lexica
Details Tokenizer pipeline, scoring tiers, DP algorithm
FAQ Common questions about scope, language packs, performance
Contributing Development workflow, TDD discipline, branch model

Project status

Current release 0.1.1
Roadmap docs/ROADMAP.md
Design spec docs/superpowers/specs/2026-04-28-trace-v0.1-design.md
Future sub-projects Multi-witness master graph · Geniza anchor detection · Text-reuse · Critical edition / apparatus

License

MIT © 2026 Benjamin Schnabel.

Citation

If you use TRACE in academic work, please cite the repository — a Zenodo DOI will follow with the first non-pre-release tag.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tracealign-0.1.3.tar.gz (25.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tracealign-0.1.3-py3-none-any.whl (23.9 kB view details)

Uploaded Python 3

File details

Details for the file tracealign-0.1.3.tar.gz.

File metadata

  • Download URL: tracealign-0.1.3.tar.gz
  • Upload date:
  • Size: 25.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for tracealign-0.1.3.tar.gz
Algorithm Hash digest
SHA256 e369847c2f52e8bb351269fc8e5bc155dade40885bdac046d29df5e0156b20ac
MD5 848d80142be495ab6aab35d093525e0b
BLAKE2b-256 3628cd2b08bcca59369c330c37a0ba24a207cc37cf7a25ab30908b7188e0e80c

See more details on using hashes here.

File details

Details for the file tracealign-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: tracealign-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 23.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for tracealign-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 2902fb6808da31c712f28ee253b2e709ff695c33cdcae905a8c1f4c2d5ddf8a2
MD5 e46256c3110ee2628e7a922c9f43728a
BLAKE2b-256 4046650807b42d58f07e8113c899686ab67d24173bb6a3327c20cd0038a2c310

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page