Invariant-preserving document wrappers with validate-repair fixing loops and atomic checkpoints, for agents that modify persistent structured content.

These details have not been verified by PyPI

Project links

Project description

wellformed

Invariant-preserving documents for agents that modify persistent state.

A chat reply is ephemeral; a config file your agent just rewrote is not. wellformed gives LLM agents three composable primitives for editing structured documents reliably:

ValidatedDocument — a document that is always well-formed and schema-valid by construction.
FixingLoop — validate → repair → re-validate, format-agnostic and LLM-agnostic.
Checkpoint — atomic rollback at file, multi-file, or directory granularity.

The core is pure Python with zero runtime dependencies. Formats are added via plugins. XML ships today; JSON / YAML / AST are on the roadmap.

Installation

pip install wellformed              # core only
pip install wellformed[xml]         # with XML plugin (installs lxml)

Requires Python 3.11 or newer.

The 30-second quickstart (XML)

from pathlib import Path
from wellformed import DocumentMutation, MutationFailedError
from wellformed.xml import XMLValidatedDocument, make_xml_schema_validator

XSD = make_xml_schema_validator(Path("schemas/note.xsd"))


class Note(XMLValidatedDocument):
    @classmethod
    def _validate_schema(cls, content):
        return XSD(content)

    @classmethod
    def _get_document_type(cls):
        return "note"

    @classmethod
    async def _repair(cls, content, errors, document_type):
        # Call your LLM of choice here. See "BYO LLM" below.
        ...


class AppendLine(DocumentMutation):
    async def execute(self, content, parsed):
        from lxml import etree
        child = etree.SubElement(parsed, "line")
        child.text = "added by the agent"
        return etree.tostring(parsed, encoding="unicode")


doc = await Note.load(Path("note.xml"))
checkpoint = doc.checkpoint()
try:
    new_doc = await doc.apply(AppendLine(name="append"))
    new_doc.save()
    checkpoint.discard()
except MutationFailedError:
    checkpoint.restore()
    raise

The fixing loop only runs when the mutation produces invalid content. If the mutation is already valid, no LLM call is made. If the mutation produces invalid content and the repair succeeds, the repaired content is written. If repair fails, MutationFailedError is raised and your code can roll back via the checkpoint.

Core concepts

ValidatedDocument — an always-valid wrapper. Once you hold an instance, the content has passed parsing and schema validation. There is no .is_valid() to forget to call.

FixingLoop — runs your validate_fn and repair_fn in a retry loop with structured reporting (SUCCESS / ALREADY_VALID / FAILED plus attempt count and remaining errors). The repair function is a protocol — you bring your own implementation.

Checkpoint — captures the pre-mutation content so you can roll back on failure. Three granularities:

Checkpoint — single file.
MultiFileCheckpoint — several files, restored in LIFO order.
DirectoryCheckpoint — whole directory copied via shutil.

An extended example: iterative repair and rollback

The quickstart is deliberately minimal. Here's a fuller example that shows two behaviours in action:

The fixing loop iterates. A single repair attempt often fixes one error but leaves another — or introduces a fresh defect. The loop retries up to max_fix_attempts times, passing the latest errors to _repair on each round.
Checkpoints roll back to the last known-good state when a mutation can't be repaired. Successful edits are preserved; only the failing one is reverted.

The document is a small XML todo list. Each <task> has a required priority attribute from {low, medium, high} and a <title> child. We run two mutations against it: one produces an out-of-enum priority (recoverable; demonstrates iterative repair) and one strips every title (unrecoverable; demonstrates rollback).

Assume two files on disk alongside the script.

todos.xsd — the schema:

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="todos">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="task" minOccurs="0" maxOccurs="unbounded">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="title" type="xs:string"/>
            </xs:sequence>
            <xs:attribute name="priority" use="required">
              <xs:simpleType>
                <xs:restriction base="xs:string">
                  <xs:enumeration value="low"/>
                  <xs:enumeration value="medium"/>
                  <xs:enumeration value="high"/>
                </xs:restriction>
              </xs:simpleType>
            </xs:attribute>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

todos.xml — the initial document:

<todos>
  <task priority="high"><title>Ship wellformed v0.1</title></task>
  <task priority="medium"><title>Write the tutorial</title></task>
</todos>

And the Python:

from pathlib import Path
from lxml import etree
from wellformed import DocumentMutation, MutationFailedError
from wellformed.xml import XMLValidatedDocument, make_xml_schema_validator

VALID_PRIORITIES = {"low", "medium", "high"}
XSD = make_xml_schema_validator(Path("todos.xsd"))


class TodoList(XMLValidatedDocument):
    """A validated XML todo-list document.

    By inheriting from XMLValidatedDocument, we get XML parsing for
    free (via the `_parse` method provided by the XML plugin). The
    base class's invariant propagates: once you hold a TodoList
    instance, its content is guaranteed to be well-formed XML that
    passes our XSD schema. Invalid content simply cannot be
    represented by this class — any operation that would produce it
    either repairs it or raises.

    The three hooks below are everything the library needs from us
    to extend the XML plugin into a concrete document type.
    """

    @classmethod
    def _validate_schema(cls, content):
        return XSD(content)

    @classmethod
    def _get_document_type(cls):
        return "todo-list"

    @classmethod
    async def _repair(cls, content, errors, document_type):
        """Naive two-pass repair strategy.

        A production implementation would typically delegate repair
        to an LLM. To keep this example runnable without network
        calls, we inspect content directly and apply deterministic
        fixes instead.

        Note two unused arguments:

        - `errors`: the list of validation messages produced by the
          last validation pass. An LLM-based repair would weave
          these into its prompt so the model knows what's wrong. A
          deterministic repair that inspects content directly — as
          this one does — can usually infer the needed fix without
          reading the error list.

        - `document_type`: the string returned by
          `_get_document_type` above. Useful as a prompt label
          ("Fix this todo-list: ...") or for dispatching when a
          single `_repair` function serves several document classes.
          We only have one document type here, so we don't need it.
        """
        try:
            root = etree.fromstring(content.encode("utf-8"))
        except etree.XMLSyntaxError:
            # If the content isn't parseable at all, we can't help
            # at this layer. Return unchanged; the fixing loop will
            # report a parse error on the next validation pass.
            return content

        # --- Strategy A: quarantine invalid priority values. ---
        # If a `priority` attribute holds a value outside the enum
        # (e.g. "High" with a capital H), move that value to a
        # non-schema `prio` attribute and drop `priority`.
        #
        # This CLEARS the enum-violation error but INTRODUCES a new
        # one: the task is now missing its required `priority`
        # attribute. That's deliberate — the fixing loop will re-run
        # us with the new errors, and Strategy B will then recover.
        bad_priority_tasks = [
            t for t in root.iter("task")
            if t.get("priority") is not None
            and t.get("priority") not in VALID_PRIORITIES
        ]
        if bad_priority_tasks:
            for t in bad_priority_tasks:
                t.set("prio", t.get("priority"))
                del t.attrib["priority"]
            return etree.tostring(root, encoding="unicode")

        # --- Strategy B: reinstate priority from the stash. ---
        # If a task lacks `priority`, look for a stashed value in
        # `prio`. Lowercase it; if the result is a valid enum
        # member, use it. Otherwise fall back to "medium". Either
        # way, drop the temporary `prio` attribute so the document
        # conforms to the schema again.
        missing_priority_tasks = [
            t for t in root.iter("task") if t.get("priority") is None
        ]
        if missing_priority_tasks:
            for t in missing_priority_tasks:
                stashed = t.get("prio", "")
                candidate = stashed.lower() if stashed else "medium"
                if candidate not in VALID_PRIORITIES:
                    candidate = "medium"
                t.set("priority", candidate)
                if "prio" in t.attrib:
                    del t.attrib["prio"]
            return etree.tostring(root, encoding="unicode")

        # Neither strategy applies. Returning content unchanged
        # makes the next validation pass report the same errors, so
        # the fixing loop will exhaust its attempts and raise.
        return content


# DocumentMutation subclasses describe *what* to change, not *how*
# to validate or repair. Each mutation's `execute` method is an
# async callable that receives the current content plus its parsed
# form (an lxml tree, here) and returns new content.
#
# Users write mutations naively — "just do the thing". `apply()`
# wraps every mutation in the fixing loop: if the produced content
# fails schema validation, `_repair` runs up to `max_fix_attempts`
# times before `apply()` raises `MutationFailedError`. This
# separation keeps each mutation focused on intent; the document
# class handles "make it valid again" on its behalf.

class BulkReprioritise(DocumentMutation):
    """Set every task's priority to `value`.

    When `value` is outside the enum (e.g. "High" capitalised), the
    resulting content fails validation and triggers the fixing loop.
    """

    def __init__(self, value: str):
        super().__init__(name=f"bulk-reprioritise-{value}")
        self.value = value

    async def execute(self, content, parsed):
        for task in parsed.iter("task"):
            task.set("priority", self.value)
        return etree.tostring(parsed, encoding="unicode")


class PurgeTitles(DocumentMutation):
    """Remove every <title>. Unrecoverable: the repair function has
    no way to invent task titles, so the fixing loop will exhaust
    its attempts."""

    def __init__(self):
        super().__init__(name="purge-titles")

    async def execute(self, content, parsed):
        for title in list(parsed.iter("title")):
            title.getparent().remove(title)
        return etree.tostring(parsed, encoding="unicode")


async def main():
    doc = await TodoList.load(Path("todos.xml"))

    # Mutation 1: produces priority="High" (capitalised, not in the
    # enum). The fixing loop runs two passes:
    #   Pass 1 — Strategy A stashes each bad value under `prio` and
    #            removes `priority`. The enum error goes away but a
    #            "priority attribute required" error appears.
    #   Pass 2 — Strategy B reads the stashed values, lowercases
    #            them, reinstates `priority="high"`, drops `prio`.
    #            The document is now valid.
    # `apply()` returns a new, valid TodoList. We save it and create
    # a checkpoint of this known-good state.
    doc = await doc.apply(BulkReprioritise("High"))
    doc.save()
    checkpoint = doc.checkpoint()

    # Mutation 2: removes every <title>. `_repair` has no strategy
    # for inventing titles, so all three attempts return unchanged
    # content. `apply()` raises `MutationFailedError`. We restore
    # the checkpoint; the file on disk is left exactly as it was
    # after the successful mutation 1.
    try:
        await doc.apply(PurgeTitles())
    except MutationFailedError:
        checkpoint.restore()

Running the example produces log output that traces both behaviours: two fixing-loop iterations for mutation 1 (attempt 1 raises the error count from 2 to 4 as Strategy A introduces its temporary defect; attempt 2 succeeds), and three failed attempts for mutation 2 before MutationFailedError fires and the checkpoint restores.

LLM-agnostic: the repair function is a protocol

_repair is just an async callable. That means you can plug in any LLM SDK — or no LLM at all. The same Note subclass above can be backed by any of these:

# --- Anthropic SDK ---
from anthropic import AsyncAnthropic
client = AsyncAnthropic()

async def repair(content, errors, document_type):
    msg = await client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=4096,
        messages=[{"role": "user", "content": f"Fix this {document_type}:\n{content}\n\nErrors:\n{errors}"}],
    )
    return msg.content[0].text

# --- OpenAI SDK ---
from openai import AsyncOpenAI
client = AsyncOpenAI()

async def repair(content, errors, document_type):
    resp = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Fix this {document_type}:\n{content}\n\nErrors:\n{errors}"}],
    )
    return resp.choices[0].message.content

# --- Purely deterministic ---
async def repair(content, errors, document_type):
    # If the schema just needs a missing closing tag added, we don't
    # need an LLM — a regex fix does.
    if not content.rstrip().endswith("</note>"):
        return content + "</note>"
    return content

Interesting consequence: you can cascade. Try the cheap deterministic fix first, only fall back to an LLM call if it doesn't resolve the errors. FixingLoop gives you the attempt count so you can key on it.

Plugins

Plugin	Status	Install
XML (lxml)	shipped	`pip install wellformed[xml]`
JSON / JSON Schema	planned	—
YAML	planned	—
Python AST	planned	—

Writing a plugin is four methods: _parse, _validate_schema, _get_document_type, _repair. See src/wellformed/xml/document.py for the reference implementation.

Releasing

Versioning is managed with bump-my-version. It updates the version in pyproject.toml, commits the change, and creates a Git tag in one step.

# Install dev dependencies (includes bump-my-version)
uv sync --group dev

# Bump the patch version: 0.1.0 -> 0.1.1
uv run bump-my-version bump patch

# Bump the minor version: 0.1.1 -> 0.2.0
uv run bump-my-version bump minor

# Bump the major version: 0.2.0 -> 1.0.0
uv run bump-my-version bump major

To preview what a bump would do without changing anything:

uv run bump-my-version bump patch --dry-run --verbose

After bumping, push the commit and tag together:

git push && git push --tags

To build and publish to PyPI:

uv build
uv publish

License

MIT.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.2

Apr 22, 2026

0.2.1

Apr 21, 2026

This version

0.1.1

Apr 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wellformed-0.1.1.tar.gz (20.6 kB view details)

Uploaded Apr 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

wellformed-0.1.1-py3-none-any.whl (19.9 kB view details)

Uploaded Apr 21, 2026 Python 3

File details

Details for the file wellformed-0.1.1.tar.gz.

File metadata

Download URL: wellformed-0.1.1.tar.gz
Upload date: Apr 21, 2026
Size: 20.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for wellformed-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`4569e8d69f1a50ae38d669eb7e28591976f91f9c1e32de51a7221ddd060248ee`
MD5	`7fce8d5e265bb7ffa854f0b7d1ea6472`
BLAKE2b-256	`5cc7ac7d0231d4875627431509cd216f940dfafe6c56c4008699c9a20584a2a0`

See more details on using hashes here.

File details

Details for the file wellformed-0.1.1-py3-none-any.whl.

File metadata

Download URL: wellformed-0.1.1-py3-none-any.whl
Upload date: Apr 21, 2026
Size: 19.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for wellformed-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bffa183a7233898c7f846f9de61b36e239883c3d46bcf392d5e5e78d3d0df5ed`
MD5	`25957e30a335d80f58d00ffa9c7b2c42`
BLAKE2b-256	`44ed5403b80923e26d6de80f64b15437fcdeebaea4fc85158ed2cb545a57d2e9`

See more details on using hashes here.

wellformed 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

wellformed

Installation

The 30-second quickstart (XML)

Core concepts

An extended example: iterative repair and rollback

LLM-agnostic: the repair function is a protocol

Plugins

Releasing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes