Skip to main content

Textual Reuse, Alignment, and Collation Engine — pairwise philological alignment with pluggable language packs

Project description

TRACE

Textual Reuse, Alignment, and Collation Engine — a Python library for pairwise philological alignment with pluggable language packs.

CI PyPI version Python versions License: MIT Documentation Status

TRACE is designed for textual criticism, manuscript witness comparison, and the creation of digital synopses and critical editions. The core is language-agnostic; the first shipped language pack covers Biblical and Rabbinic Hebrew (hbo).


Highlights

  • Tokenizer pipeline with editorial-marker awareness ([reconstructed], ⟦deletion⟧, 〈insertion〉, (expanded), lacunae).
  • Tiered scoring returning (score, reason) per token pair — EXACT, NIQQUD_STRIPPED, PLENE_DEFECTIVE, ABBREVIATION, ORTHOGRAPHIC, INSERTION, OMISSION, NO_MATCH.
  • Semi-global Needleman–Wunsch with affine gap penalties (Gotoh) and a multi-token abbreviation lookahead (ר"ירבי ישמעאל).
  • Hebrew language pack with niqqud strip, plene/defective skeleton matching, gershayim/maqqef tokenizer hooks, and a seed lexicon of rabbinic abbreviations (extendable via Lexica.merge()).
  • I/O for plain text, JSON (round-trip), eScriptorium exports (with bbox + line metadata), and TEI XML (<tei:w> mode + flow-text fallback).
  • Reproducible — every AlignmentResult carries trace_version and language_pack_version in its params.

Installation

pip install tracealign

Requires Python 3.10+. Pulls pydantic, numpy, lxml, and rapidfuzz.

Quick start

import tracealign

w1 = tracealign.tokenize("שלום עולם רַבִּי דויד ר\"י אמר", lang="hbo", seq_label="W1")
w2 = tracealign.tokenize("שלום עולם רבי דוד רבי ישמעאל אמר", lang="hbo", seq_label="W2")

result = tracealign.align(w1, w2, lang="hbo")

print(f"total score: {result.total_score:.2f}")
print(f"summary: {dict(result.summary)}")
for m in result.matches:
    a = m.token_a.text if m.token_a else "—"
    b = m.token_b.text if m.token_b else "—"
    print(f"  {a:>10}{b:<10}  {m.reason.value:<18} {m.score:.2f}")

Output (abridged):

total score: 0.91
summary: {EXACT: 3, NIQQUD_STRIPPED: 1, PLENE_DEFECTIVE: 1, ABBREVIATION: 1}
       שלום ↔ שלום        exact              1.00
       עולם ↔ עולם        exact              1.00
      רַבִּי ↔ רבי         niqqud_stripped    0.95
       דויד ↔ דוד          plene_defective    0.85
        ר"י ↔ רבי          abbreviation       0.85   (primary)
        ר"י ↔ ישמעאל       abbreviation       0.00   (continuation)
        אמר ↔ אמר          exact              1.00

See the documentation for installation details, the full API, FAQs, and the design rationale.

Documentation

Section What it covers
Installation pip / from source / dev setup
Usage Tokenize, align, work with the result, custom lexica
Details Tokenizer pipeline, scoring tiers, DP algorithm
FAQ Common questions about scope, language packs, performance
Contributing Development workflow, TDD discipline, branch model

Project status

Current release 0.1.1
Roadmap docs/ROADMAP.md
Design spec docs/superpowers/specs/2026-04-28-trace-v0.1-design.md
Future sub-projects Multi-witness master graph · Geniza anchor detection · Text-reuse · Critical edition / apparatus

License

MIT © 2026 Benjamin Schnabel.

Citation

If you use TRACE in academic work, please cite the repository — a Zenodo DOI will follow with the first non-pre-release tag.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tracealign-0.1.2.tar.gz (25.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tracealign-0.1.2-py3-none-any.whl (23.9 kB view details)

Uploaded Python 3

File details

Details for the file tracealign-0.1.2.tar.gz.

File metadata

  • Download URL: tracealign-0.1.2.tar.gz
  • Upload date:
  • Size: 25.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for tracealign-0.1.2.tar.gz
Algorithm Hash digest
SHA256 8cbb9c96d768cd6d4e77ba119dd1d206b63a27d0d5327f3f8231c8fc1c3aee84
MD5 c0d645d6a8c777885aee78f23ae237fd
BLAKE2b-256 c3947a39d99363c3d4594fa0a83cadc478ba7386fdb01de7a73abebc0f135f71

See more details on using hashes here.

File details

Details for the file tracealign-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: tracealign-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 23.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for tracealign-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 17f6b5c87465edf5b26800eae6ef98d4111ae43325a194a06c717ab622b5c880
MD5 e394a9b432d7e16a0f83927b824ae8d2
BLAKE2b-256 247b69b6bc72f2500f025c8b89e5f714bbe9a27737bd1fad1db30081341995b9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page