Textual Reuse, Alignment, and Collation Engine — pairwise philological alignment with pluggable language packs
Project description
TRACE
Textual Reuse, Alignment, and Collation Engine — a Python library for pairwise philological alignment with pluggable language packs.
TRACE is designed for textual criticism, manuscript witness comparison, and the creation of digital synopses and critical editions. The core is language-agnostic; the first shipped language pack covers Biblical and Rabbinic Hebrew (hbo).
Highlights
- Tokenizer pipeline with editorial-marker awareness (
[reconstructed],⟦deletion⟧,〈insertion〉,(expanded), lacunae). - Tiered scoring returning
(score, reason)per token pair —EXACT,NIQQUD_STRIPPED,PLENE_DEFECTIVE,ABBREVIATION,ORTHOGRAPHIC,INSERTION,OMISSION,NO_MATCH. - Semi-global Needleman–Wunsch with affine gap penalties (Gotoh) and a multi-token abbreviation lookahead (
ר"י↔רבי ישמעאל). - Hebrew language pack with niqqud strip, plene/defective skeleton matching, gershayim/maqqef tokenizer hooks, and a seed lexicon of rabbinic abbreviations (extendable via
Lexica.merge()). - I/O for plain text, JSON (round-trip), eScriptorium exports (with bbox + line metadata), and TEI XML (
<tei:w>mode + flow-text fallback). - Reproducible — every
AlignmentResultcarriestrace_versionandlanguage_pack_versionin its params.
Installation
pip install tracealign
Requires Python 3.10+. Pulls pydantic, numpy, lxml, and rapidfuzz.
Quick start
import tracealign
w1 = tracealign.tokenize("שלום עולם רַבִּי דויד ר\"י אמר", lang="hbo", seq_label="W1")
w2 = tracealign.tokenize("שלום עולם רבי דוד רבי ישמעאל אמר", lang="hbo", seq_label="W2")
result = tracealign.align(w1, w2, lang="hbo")
print(f"total score: {result.total_score:.2f}")
print(f"summary: {dict(result.summary)}")
for m in result.matches:
a = m.token_a.text if m.token_a else "—"
b = m.token_b.text if m.token_b else "—"
print(f" {a:>10} ↔ {b:<10} {m.reason.value:<18} {m.score:.2f}")
Output (abridged):
total score: 0.91
summary: {EXACT: 3, NIQQUD_STRIPPED: 1, PLENE_DEFECTIVE: 1, ABBREVIATION: 1}
שלום ↔ שלום exact 1.00
עולם ↔ עולם exact 1.00
רַבִּי ↔ רבי niqqud_stripped 0.95
דויד ↔ דוד plene_defective 0.85
ר"י ↔ רבי abbreviation 0.85 (primary)
ר"י ↔ ישמעאל abbreviation 0.00 (continuation)
אמר ↔ אמר exact 1.00
See the documentation for installation details, the full API, FAQs, and the design rationale.
Documentation
| Section | What it covers |
|---|---|
| Installation | pip / from source / dev setup |
| Usage | Tokenize, align, work with the result, custom lexica |
| Details | Tokenizer pipeline, scoring tiers, DP algorithm |
| FAQ | Common questions about scope, language packs, performance |
| Contributing | Development workflow, TDD discipline, branch model |
Project status
| Current release | 0.1.1 |
| Roadmap | docs/ROADMAP.md |
| Design spec | docs/superpowers/specs/2026-04-28-trace-v0.1-design.md |
| Future sub-projects | Multi-witness master graph · Geniza anchor detection · Text-reuse · Critical edition / apparatus |
License
MIT © 2026 Benjamin Schnabel.
Citation
If you use TRACE in academic work, please cite the repository — a Zenodo DOI will follow with the first non-pre-release tag.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tracealign-0.1.3.tar.gz.
File metadata
- Download URL: tracealign-0.1.3.tar.gz
- Upload date:
- Size: 25.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e369847c2f52e8bb351269fc8e5bc155dade40885bdac046d29df5e0156b20ac
|
|
| MD5 |
848d80142be495ab6aab35d093525e0b
|
|
| BLAKE2b-256 |
3628cd2b08bcca59369c330c37a0ba24a207cc37cf7a25ab30908b7188e0e80c
|
File details
Details for the file tracealign-0.1.3-py3-none-any.whl.
File metadata
- Download URL: tracealign-0.1.3-py3-none-any.whl
- Upload date:
- Size: 23.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2902fb6808da31c712f28ee253b2e709ff695c33cdcae905a8c1f4c2d5ddf8a2
|
|
| MD5 |
e46256c3110ee2628e7a922c9f43728a
|
|
| BLAKE2b-256 |
4046650807b42d58f07e8113c899686ab67d24173bb6a3327c20cd0038a2c310
|