Skip to main content

A tool that processes Old English texts and provides a toolkit for working with the text.

Project description

wyrdcraeft

Process Old English texts into structured JSON and generate morphology.

Why wyrdcraeft?

If you work with Old English (Anglo-Saxon) texts - editions, corpora, translation tooling, or digital humanities projects - you often need a single pipeline that turns raw or marked-up sources into a consistent, machine-readable form. wyrdcraeft provides that.

  • It ingests plain text and TEI XML, converts them into a standard JSON schema that is prose, verse and dialogue aware
  • Provides diacritic restoration Old English texts that have no diacritic marks.
  • Includes an Old English morphology generator based on established lexical and grammatical resources.
  • Provides other minor utilities for working with Old English text.

Use it from the command line or from Python, and avoid ad-hoc scripts and format fragmentation.

Features

  • Ingest Old English texts from text files and TEI XML.
  • Convert to a standard JSON format via deterministic heuristics, TEI parsing, or LLM-based extraction.
  • Handle both prose and verse (paragraphs, verse lines, dialogue, sections).
  • Generate Old English morphology forms using the migrated Python implementation from Ondřej Tichý's Perl-based generator (based on the Bosworth & Toller, An Anglo-Saxon Dictionary, 1898, and Wright & Wright, Old English Grammar, 1908).
  • Diacritic workflows: macron restoration and disambiguation tooling for normalized forms.

Installation

Prerequisites: Python 3.11–3.13.

From PyPI with pip:

pip install wyrdcraeft
wyrdcraeft --help

With uv:

sh -c "$(curl -fsSL https://astral.sh/uv/install)"
uv tool install wyrdcraeft
wyrdcraeft --help

With pipx:

pipx install wyrdcraeft
wyrdcraeft --help

From source (development):

git clone https://github.com/cmalek/wyrdcraeft.git
cd wyrdcraeft
uv sync --dev

Quick start

Command line: convert a text file to JSON:

wyrdcraeft convert --title="My Title" input.txt output.json

Python: use DocumentIngestor to get an OldEnglishText model:

from wyrdcraeft import DocumentIngestor, TextMetadata

metadata = TextMetadata(
    title="The Anglo-Saxon Chronicle",
    source="https://example.org/source.txt",
)
oe_json = DocumentIngestor().ingest(
    source_path="path/to/source.txt",
    metadata=metadata,
)

For TEI XML, pass a .xml path; DocumentIngestor will use the TEI ingestor. For LLM-based extraction, use ingest(..., use_llm=True, llm_config=...). See the full documentation for configuration and options.

Documentation

Full documentation (installation, quickstart, CLI, Python client, configuration, FAQ): https://oe_json_extractor.readthedocs.io

Contributing and license

Contributing and coding standards are described in the documentation (runbook). This project is licensed under the MIT License — see LICENSE.txt.


Licensing and Provenance

Bosworth-Toller Old English Dictionary

The OCR extracted text of the Bosworth-Toller Old English Dictionary used in this project is from the Germanic Lexicon Project. The scanning was done by Jason Burton, B. Dan Fairchild, Margaret Hoyt, Grace Mrowicki, Michael O'Keefe, Sarah Hartman, Finlay Logan, Sean Crist, Thomas McFadden, David Harrison, and Sean Crist; that data is in the public domain.

Morphological Analyser of Old English

  • The Old English morphology generator in wyrdcraeft is based on the work of Ondřej Tichý's thesis, Morphological Analyser of Old English (2017).
  • The upstream morphological generator Perl code and data is (c) Ondřej Tichý, is released under the CC BY 4.0 license. The modified Perl code itself, with Madeleine Thompson's changes, can be found at github:madeleineth/tichy_oe_generator.
  • Changes made to the morphology generator in this repository by the maintainers of wyrdcraeft are released under the MIT license.

All other code

  • All other code implemented directly by Christopher Malek, also released under the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wyrdcraeft-1.0.1.tar.gz (779.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wyrdcraeft-1.0.1-py3-none-any.whl (755.0 kB view details)

Uploaded Python 3

File details

Details for the file wyrdcraeft-1.0.1.tar.gz.

File metadata

  • Download URL: wyrdcraeft-1.0.1.tar.gz
  • Upload date:
  • Size: 779.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.10

File hashes

Hashes for wyrdcraeft-1.0.1.tar.gz
Algorithm Hash digest
SHA256 686416349db97b0b20994124f2aca494f3ec4c60b8a28167aba92bbc04f85c43
MD5 7335384bfdb0df4fe6a460a39335dbb4
BLAKE2b-256 81ef0c7409e324b80ce4303cc910994cc6ce78d7c2fec13cce40cee4b3f60780

See more details on using hashes here.

File details

Details for the file wyrdcraeft-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: wyrdcraeft-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 755.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.10

File hashes

Hashes for wyrdcraeft-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 db1a5ed16c1b0c77d23c76e8abeffa27f0712e8b017f927a622271d62d1b35d2
MD5 47ac64991841d32f83771183d0e2fdd7
BLAKE2b-256 5f0bbc454e962101e5877d16c3677994c2664575a959405ef91ca2a59e3041ef

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page