Skip to main content

Add your description here

Project description

kindaxml, a close-enough, XML-ish markup for LLM output

KindaXML is an XML-inspired annotation DSL designed for LLM-generated text. It keeps the familiar <tag attr=...> shape, but the parser is tolerant: it recovers from missing end tags, missing quotes, and other common “almost XML” mistakes.

KindaXML is not XML (and not meant to be parsed by strict XML parsers). Think: well-formed-ish.

Why KindaXML?

LLMs are good at emitting XML-like text, but strict XML breaks easily. KindaXML aims to be:

  • LLM-friendly: angle brackets and attributes feel natural in prompts.
  • Deterministic recovery: malformed input still produces predictable output.
  • Annotation-first: tags annotate spans of text rather than building a complex DOM.
  • Configurable: recognized tags are whitelisted, unknown tags can be stripped or preserved.

Design: Annotation DSL (Option A) + a pinch of “blocks”

KindaXML’s primary output is a stream of text segments, each optionally annotated:

[
  {"text": "We shipped last week", "ann": [{"tag":"cite","attrs":{"id":"1"}}]},
  {"text": ". ", "ann": []},
  {"text": "Details", "ann": [{"tag":"note","attrs":{}}]}
]

KindaXML intentionally avoids deep nesting. In fact, it auto-closes open tags when the next tag begins, which keeps structures shallow and robust.

Syntax overview

Tags

  • Start tag: <tag ...>
  • End tag: </tag>
  • Self-closing tag: <tag .../>

Tag names match:

[A-Za-z][A-Za-z0-9_\-:.]*

Attributes

Supported forms:

  • a="x"
  • a='x'
  • a=x (unquoted)
  • a (boolean attribute; implies true)
  • Whitespace around = is allowed.

Parsing rules (the “close enough” part)

1) Tag boundary detection

A tag begins at < and ends at the first >.

If a quote starts inside the tag but never closes, it is implicitly closed at >.

Example:

<cite id='1,2>text</cite>

Parses as:

  • tag = cite
  • id = "1,2" (quote recovered)
  • inner text = text

2) Auto-close on encountering another tag

If a start tag is open and the parser encounters the next <something...>, the current tag is implicitly closed immediately before that next <.

This is the core rule that prevents runaway structures.

Example:

<A>hello <B>world</B>

<A> auto-closes before <B>.

3) Missing end tags are tolerated

If a tag never closes, it’s recovered according to its configured span strategy (below).

4) Self-closing tags

<tag .../> is treated as a marker annotation at that position (or optionally “annotate next token”, configurable).

Span strategies (how KindaXML decides what a tag annotates)

KindaXML is annotation-first. Each recognized tag can be configured with a span strategy:

inline (normal XML-ish)

If <tag> ... </tag> is present, annotate the inner range.

retro_line (great for citations)

If <cite ...> is unclosed, annotate the text on the current line before the tag (from last emitted newline to the tag start), optionally trimming punctuation/whitespace.

Example:

We shipped last week <cite id=1>.

The cite attaches to We shipped last week (not the punctuation).

Other useful strategies (optional)

  • forward_until_tag: annotate from the end of <tag ...> to the next tag start.
  • forward_until_newline: annotate until newline.
  • forward_next_token: annotate the next token/word.
  • noop: ignore tag if unclosed (marker-only tags).

Unknown tags

You instruct the LLM to use a whitelist of recognized tags, but the parser can handle unknown tags in one of three modes:

  • strip (default-friendly): drop unknown tag markup, keep inner text
  • passthrough: keep unknown tags as literal text
  • treat_as_text: don’t parse unknown tags at all; treat <...> as text

Escaping / literal text (CDATA support)

KindaXML can support XML’s CDATA form:

  • Start: <![CDATA[
  • End: ]]>

Inside CDATA, nothing is parsed as tags.

Example:

<note><![CDATA[
Use < and > freely here. Even <fake tags>.
]]></note>

If ]]> is missing, CDATA runs to end-of-document (recovered).

(If you prefer simpler escaping, you can also support \< and \> as literals.)

Using the Rust crate

use kindaxml::{parse, ParserConfig, UnknownMode};

fn main() {
    let mut cfg = ParserConfig::default();
    cfg.recognized_tags = ["cite", "note"].into_iter().map(String::from).collect();
    cfg.case_sensitive_tags = false;
    cfg.unknown_mode = UnknownMode::Strip;

    let input = "We shipped <cite id=1>last week</cite>.";
    let parsed = parse(input, &cfg);

    for segment in parsed.segments {
        println!("{:?} -> {:?}", segment.text, segment.annotations);
    }
}

Python bindings

The Python module is built with maturin (--features python). Basic usage:

from kindaxml import parse

result = parse("We shipped <cite id=1>last week</cite>.")
print(result.text)

To customize parsing, pass a ParserConfig:

from kindaxml import parse, ParserConfig

cfg = ParserConfig()
cfg.set_recognized_tags(["cite", "note", "todo"])
cfg.set_unknown_mode("strip")  # or passthrough / treat_as_text
cfg.set_recovery_strategy("cite", "retro_line")
cfg.set_autoclose_on_any_tag(True)

result = parse("We shipped <cite id=1>last week</cite>.", cfg)

ParserConfig setters roughly mirror the Rust config: per-tag recovery strategies (retro_line, forward_until_tag, forward_until_newline, forward_next_token, noop), punctuation trimming, auto-close toggles, and case sensitivity.

Full Python configuration example

from kindaxml import parse, ParserConfig

cfg = ParserConfig()
# Only these tags are recognized
cfg.set_recognized_tags(["cite", "note", "risk", "todo"])

# Unknown tags: remove markup but keep inner text
cfg.set_unknown_mode("strip")

# Recovery strategies per tag
cfg.set_recovery_strategy("cite", "retro_line")          # attach backward on the line
cfg.set_recovery_strategy("note", "forward_until_newline")
cfg.set_recovery_strategy("risk", "forward_next_token")

# Auto-close behaviour
cfg.set_autoclose_on_any_tag(True)    # close open tag when any new tag starts
cfg.set_autoclose_on_same_tag(True)   # close when the same tag reappears

# Misc toggles
cfg.set_trim_punctuation(True)        # trim punctuation for retro spans
cfg.set_case_sensitive_tags(False)    # treat tags case-insensitively

text = "We shipped last week <cite id=1>. Risks: <risk level=high> perf"
parsed = parse(text, cfg)

print(parsed.text)  # tag-stripped text
for seg in parsed.segments:
    print(seg, seg.annotations)
for marker in parsed.markers:
    print(marker)

ParserConfig exposes toggles for unknown tags, per-tag recovery strategies, case sensitivity, punctuation trimming, and auto-close behavior. The default config is conservative and strips unknown tags.

Examples

Run the runnable demo with cargo run --example basic to see the original snippets alongside their parsed segments and markers.

Closed tag (inline span)

Input:

We shipped <cite id="1">last week</cite>.

Output (conceptual):

  • We shipped (no annotations)
  • last week (annotated: cite{id=1})
  • . (no annotations)

Unclosed cite (retro_line)

Input:

We shipped last week <cite id=1>.

Output:

  • We shipped last week (annotated: cite{id=1})
  • .
  • (tag removed)

Broken quote recovery

Input:

<cite id='1, 2>Evidence</cite>

Recovered as id="1,2".

Auto-close on next tag

Input:

alpha <note>bravo <cite id=9> charlie
  • <note> auto-closes before <cite ...>
  • <cite> is unclosed and recovered by its strategy

Failure cases / limitations (by design)

Nesting will not behave like XML

KindaXML is not a DOM language. If you try to nest, the “auto-close on next tag” rule will flatten it.

Bad idea:

<A>outer <B>inner</B> outer</A>

KindaXML outcome: <A> likely ends before <B>, and </A> may become stray.

Guidance: don’t nest; prefer sibling tags.

Attribute ambiguity in severely malformed tags

Example:

<tag a="x y z b=2>

KindaXML will recover by closing the quote at > and treat the entire remaining text as part of a. This is intentional: recovery is bounded to the tag.

Guidance: keep attributes simple; use CDATA for messy text.

Stray end tags

Because auto-close flattens structure, you may get stray </tag>. By default, recognized stray end tags are dropped; unknown ones can be passed through (configurable).

Recommended prompting style for LLMs

Tell the model:

  • Use only these tags: <cite> <note> <todo> <risk> ... (whitelist)
  • Do not nest tags
  • Prefer postfix citations: ... statement <cite id=1>.
  • Use CDATA for code or text with </>: <![CDATA[ ... ]]>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kindaxml-0.1.1.tar.gz (30.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

kindaxml-0.1.1-cp312-cp312-win_amd64.whl (184.7 kB view details)

Uploaded CPython 3.12Windows x86-64

kindaxml-0.1.1-cp312-cp312-win32.whl (172.8 kB view details)

Uploaded CPython 3.12Windows x86

kindaxml-0.1.1-cp312-cp312-musllinux_1_2_x86_64.whl (499.4 kB view details)

Uploaded CPython 3.12musllinux: musl 1.2+ x86-64

kindaxml-0.1.1-cp312-cp312-musllinux_1_2_i686.whl (531.0 kB view details)

Uploaded CPython 3.12musllinux: musl 1.2+ i686

kindaxml-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (331.0 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

kindaxml-0.1.1-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl (352.7 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.5+ i686

kindaxml-0.1.1-cp312-cp312-macosx_11_0_arm64.whl (288.8 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file kindaxml-0.1.1.tar.gz.

File metadata

  • Download URL: kindaxml-0.1.1.tar.gz
  • Upload date:
  • Size: 30.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.10.2

File hashes

Hashes for kindaxml-0.1.1.tar.gz
Algorithm Hash digest
SHA256 4dbf2f968c1bf331bf015127275fa82aceae8457ec30a524c6bb49bc1392ddfa
MD5 ea4ffc1de0d734896f75bf2ccb5376ae
BLAKE2b-256 e305939aa28e81129b7f102d57966c4e2fb5fecb3fa0f5c96bd4cce67bc2df7e

See more details on using hashes here.

File details

Details for the file kindaxml-0.1.1-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for kindaxml-0.1.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 4ff11a4a586b588a8a7c719151bbe38507380a5cb933aab94e05d68f88ee3fc1
MD5 10a88e8fb5086df0f7837528fbc101bf
BLAKE2b-256 84d0b16ba37546734efb336dbe3dc607ada5c8603e28e2498fbd94ce79926368

See more details on using hashes here.

File details

Details for the file kindaxml-0.1.1-cp312-cp312-win32.whl.

File metadata

  • Download URL: kindaxml-0.1.1-cp312-cp312-win32.whl
  • Upload date:
  • Size: 172.8 kB
  • Tags: CPython 3.12, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.10.2

File hashes

Hashes for kindaxml-0.1.1-cp312-cp312-win32.whl
Algorithm Hash digest
SHA256 d1861e2a45cf532dba1b69f14dcd3ddc58815beeb5421ca713159efb66a8164c
MD5 76b68467b4060b0e97850fac809b3c95
BLAKE2b-256 5adcc5a98418b9f7570fab3aa759bfb310458c36c08863501a57ff28c793675d

See more details on using hashes here.

File details

Details for the file kindaxml-0.1.1-cp312-cp312-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for kindaxml-0.1.1-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 9dcd8206d65723a26c9fe345a3adede76e61bf41b201d2d49a03f4dbc9cc8402
MD5 110105e8d9133b1c10465aa165b00bf4
BLAKE2b-256 84c288d6ad0b2749848d37e4b36534468ec49feb95e098993b4e4f299e55f603

See more details on using hashes here.

File details

Details for the file kindaxml-0.1.1-cp312-cp312-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for kindaxml-0.1.1-cp312-cp312-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 d96f4851b9539ff3a191fecaaf21a8ca29022a3a9d48d8a41696256a8f7d1e72
MD5 fd6059052f71fe2feb38ad72c65151ad
BLAKE2b-256 98c0b336a50551a1d158eb2f9be0745052ba6d9f1da1ac9b35e3b8f8883fbdb4

See more details on using hashes here.

File details

Details for the file kindaxml-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for kindaxml-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 25ba957406437d8bce0e4fb177865303a4518f28878436d51ad1d6f76760c8f8
MD5 23df0bee808cb43f4df4f0ca358289ce
BLAKE2b-256 3470cab8c73b89f1600734649d039ff24551218e07a7746a0e70b92ea307081f

See more details on using hashes here.

File details

Details for the file kindaxml-0.1.1-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl.

File metadata

File hashes

Hashes for kindaxml-0.1.1-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl
Algorithm Hash digest
SHA256 74223b03b6ea3a1d64e56441766fc1e854ddfbe8c880ee06a416d49c7c5f6f0e
MD5 efdbb2dc6dce481897ceb383d8b4f517
BLAKE2b-256 749cd4aa5e816a7dd62bb12a5b60d747a132087bec691255d7806145a3b4ca8b

See more details on using hashes here.

File details

Details for the file kindaxml-0.1.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kindaxml-0.1.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 22daf40ad8d5f629fde8bb0b05232cf7e5f598f0d5997b99394ac7b7d60c15f7
MD5 eee2e06bf729754878bdf7ec2f846992
BLAKE2b-256 477dc5ac6bdd6224e56ecbf9df93d0f4824d0e2713143be662ce8c5fc85b50cd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page