Minimal markup for Latin text collections

These details have not been verified by PyPI

Project links

Project description

txtdown

Minimal markup for Latin text collections using human-readable markup with inferrable hierarchical structure for scholarly citation.

Installation

pip install git+https://github.com/diyclassics/txtdown.git

Quick Start

from txtdown import parse, write

# Parse a .txtd file
doc = parse("sulpicia.txtd")

# Access metadata
print(doc.metadata.author)  # "Sulpicia"
print(doc.metadata.work)    # "Epistulae"

# Access by citation
line = doc.get("2.3")       # Section 2, line 3
section = doc.get("1")      # Entire section 1

# Iterate sections and lines
for section in doc.sections:
    for line in section.lines:
        print(f"{section.id}.{line.number}: {line.text}")

# Write back to file (round-trip safe)
write(doc, "output.txtd")

Format Specification

A .txtd file consists of a YAML front matter block followed by sections separated by horizontal rules (---). The front matter block is required and must include a work field; parse() raises ValueError otherwise. To parse a fragment without metadata (e.g. a single line or section), pass strict=False.

Basic Structure

---
author: Sulpicia
work: Epistulae
source: https://thelatinlibrary.com/sulpicia.html
---

--- 1

Tandem venit amor, qualem texisse pudori
    quam nudasse alicui sit mihi fama magis.
exorata meis illum Cytherea Camenis
    attulit in nostrum deposuitque sinum.
etc.

--- 2

Invisus natalis adest, qui rure molesto
    et sine Cerintho tristis agendus erit.
etc.

Sections

Sections are separated by --- (three or more hyphens)
Sections auto-number (1, 2, 3...) unless given explicit IDs (best practice)
Explicit section ID: --- prooemium or --- 1a
Section with title: --- prooemium: Introduction

Lines (for verse)

Lines auto-number within each section (1, 2, 3...)
Blank lines don't count toward line numbering
Access via citation: doc.get("2.3") returns section 2, line 3

Line indentation (mode: verse): Leading whitespace indicates poetic structure (e.g., pentameter lines in elegiac couplets):

Tandem venit amor, qualem texisse pudori
    quam nudasse alicui sit mihi fama magis.

The parser preserves indentation. For NLP, TxtdownReader strips leading whitespace when joining lines for sentence segmentation.

Speaker Markup (dramatic texts)

For dramatic texts, use @Speaker: at the start of a line to mark speaker attribution:

@Diocletianus: Quid sibi vult ista, quae vos agitat, fatuitas?
@Agapes: quod signum fatuitatis nobis inesse deprehendis?
@Diocletianus: Evidens magnumque.

The parser extracts the speaker name into line.speaker and keeps line.text as pure speech text — ideal for NLP pipelines that need clean text without markup.

doc = parse("dulcitius.txtd")
for line in doc.sections[0].lines:
    print(f"{line.speaker}: {line.text}")
# Diocletianus: Quid sibi vult ista...

Non-speaker lines (stage directions, prose) have line.speaker = None. Speaker markup round-trips through write().

Cross-source Quotation

Use > at the start of a line to mark text quoted verbatim from another literary source — an author embedding a poet's verse in their own prose, for example. This repurposes the familiar blockquote convention for the citational habits of classical texts:

Quamquam Ennius recte:

> Amicus certus in re incerta cernitur,

tamen haec duo levitatis et infirmitatis plerosque convincunt.

The parser strips the > marker and flags the line with line.is_quote = True, keeping line.text as clean quoted text. Consecutive > lines form a multi-line quotation:

> Negat quis, nego; ait, aio; postremo imperavi egomet mihi
> Omnia adsentari,

doc = parse("cicero-de-amicitia.txtd")
quotes = [line.text for s in doc.sections for line in s.lines if line.is_quote]
# ['Amicus certus in re incerta cernitur,', ...]

Non-quote lines have line.is_quote = False. Quotation markup round-trips through write(). See examples/cicero-de-amicitia.txtd (Cicero quoting Ennius and Terence) and examples/augustine-civ-dei-1.2.txtd (Augustine quoting Virgil).

Metadata

Field	Description
`work`	Work title (required)
`author`	Author name
`source`	Source URL or reference
`scope`	Portion of work in file (e.g., `1-6` for books 1-6)

Additional fields are preserved in metadata.extras.

API Reference

Functions

parse(path_or_content: str, *, strict: bool = True) -> Document — Parse a .txtd file or string. Strict by default: raises ValueError if the front matter block or work field is missing; pass strict=False for fragments.
write(doc: Document, path: str | None) -> str — Write to file if path given; always returns serialized string

Classes

Document — Container with metadata: Metadata and sections: list[Section]
Section — Container with id: str, lines: list[Line], optional title and metadata
Line — Container with text: str, number: int, optional speaker: str | None and label: str | None, and is_quote: bool (cross-source quotation)
Metadata — Container with author, work, source, scope, and extras dict

Development

# Clone and install dev dependencies
git clone https://github.com/diyclassics/txtdown.git
cd txtdown
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=txtdown --cov-report=term-missing

Project History

The idea for txtdown originated in January 2018, inspired by the need for a document format for Latin text collections that balanced the simplicity of plaintext with the more involved markup of XML-based formats like TEI. The goal was to create a format that is both human-readable and computer-tractable, supporting hierarchical structures, fundamental annotations, and embedded metadata. Txtdown has since been influenced by ongoing work on annotation projects such as the Representing Women Authorship in the Latin Treebanks (RWALT) project.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.0

Jun 22, 2026

This version

0.2.0

Jun 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

txtdown-0.2.0.tar.gz (52.3 kB view details)

Uploaded Jun 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

txtdown-0.2.0-py3-none-any.whl (12.6 kB view details)

Uploaded Jun 20, 2026 Python 3

File details

Details for the file txtdown-0.2.0.tar.gz.

File metadata

Download URL: txtdown-0.2.0.tar.gz
Upload date: Jun 20, 2026
Size: 52.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.11

File hashes

Hashes for txtdown-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`a88eb630ac449bafa8195cfa024838c10918f2c8058f5afb4f0290257a8510f7`
MD5	`99ee348435f2a1a7ac9d3e1caccdf790`
BLAKE2b-256	`f08633cdc349c6fe96708ebea2ee546ec3c9ad675542eb40f1d9ab48596b11fd`

See more details on using hashes here.

File details

Details for the file txtdown-0.2.0-py3-none-any.whl.

File metadata

Download URL: txtdown-0.2.0-py3-none-any.whl
Upload date: Jun 20, 2026
Size: 12.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.11

File hashes

Hashes for txtdown-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`543ad859b7e074c1fbfec4818536a65d43c57351a8ec23d7957a902275663773`
MD5	`6fec7137353b972eb4fa29300f1f068b`
BLAKE2b-256	`cdc8e623405ffefcb70cb0968ab5f1e39bf9c60598072752784cc9a3f21fa550`

See more details on using hashes here.

txtdown 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

txtdown

Installation

Quick Start

Format Specification

Basic Structure

Sections

Lines (for verse)

Speaker Markup (dramatic texts)

Cross-source Quotation

Metadata

API Reference

Functions

Classes

Development

Project History

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes