Skip to main content

Add your description here

Project description

kindaxml, a close-enough, XML-ish markup for LLM output

KindaXML is an XML-inspired annotation DSL designed for LLM-generated text. It keeps the familiar <tag attr=...> shape, but the parser is tolerant: it recovers from missing end tags, missing quotes, and other common “almost XML” mistakes.

KindaXML is not XML (and not meant to be parsed by strict XML parsers). Think: well-formed-ish.

Why KindaXML?

LLMs are good at emitting XML-like text, but strict XML breaks easily. KindaXML aims to be:

  • LLM-friendly: angle brackets and attributes feel natural in prompts.
  • Deterministic recovery: malformed input still produces predictable output.
  • Annotation-first: tags annotate spans of text rather than building a complex DOM.
  • Configurable: recognized tags are whitelisted, unknown tags can be stripped or preserved.

Design: Annotation DSL (Option A) + a pinch of “blocks”

KindaXML’s primary output is a stream of text segments, each optionally annotated:

[
  {"text": "We shipped last week", "ann": [{"tag":"cite","attrs":{"id":"1"}}]},
  {"text": ". ", "ann": []},
  {"text": "Details", "ann": [{"tag":"note","attrs":{}}]}
]

KindaXML intentionally avoids deep nesting. In fact, it auto-closes open tags when the next tag begins, which keeps structures shallow and robust.

Syntax overview

Tags

  • Start tag: <tag ...>
  • End tag: </tag>
  • Self-closing tag: <tag .../>

Tag names match:

[A-Za-z][A-Za-z0-9_\-:.]*

Attributes

Supported forms:

  • a="x"
  • a='x'
  • a=x (unquoted)
  • a (boolean attribute; implies true)
  • Whitespace around = is allowed.

Parsing rules (the “close enough” part)

1) Tag boundary detection

A tag begins at < and ends at the first >.

If a quote starts inside the tag but never closes, it is implicitly closed at >.

Example:

<cite id='1,2>text</cite>

Parses as:

  • tag = cite
  • id = "1,2" (quote recovered)
  • inner text = text

2) Auto-close on encountering another tag

If a start tag is open and the parser encounters the next <something...>, the current tag is implicitly closed immediately before that next <.

This is the core rule that prevents runaway structures.

Example:

<A>hello <B>world</B>

<A> auto-closes before <B>.

3) Missing end tags are tolerated

If a tag never closes, it’s recovered according to its configured span strategy (below).

4) Self-closing tags

<tag .../> is treated as a marker annotation at that position (or optionally “annotate next token”, configurable).

Span strategies (how KindaXML decides what a tag annotates)

KindaXML is annotation-first. Each recognized tag can be configured with a span strategy:

inline (normal XML-ish)

If <tag> ... </tag> is present, annotate the inner range.

retro_line (great for citations)

If <cite ...> is unclosed, annotate the text on the current line before the tag (from last emitted newline to the tag start), optionally trimming punctuation/whitespace.

Example:

We shipped last week <cite id=1>.

The cite attaches to We shipped last week (not the punctuation).

Other useful strategies (optional)

  • forward_until_tag: annotate from the end of <tag ...> to the next tag start.
  • forward_until_newline: annotate until newline.
  • forward_next_token: annotate the next token/word.
  • noop: ignore tag if unclosed (marker-only tags).

Unknown tags

You instruct the LLM to use a whitelist of recognized tags, but the parser can handle unknown tags in one of three modes:

  • strip (default-friendly): drop unknown tag markup, keep inner text
  • passthrough: keep unknown tags as literal text
  • treat_as_text: don’t parse unknown tags at all; treat <...> as text

Escaping / literal text (CDATA support)

KindaXML can support XML’s CDATA form:

  • Start: <![CDATA[
  • End: ]]>

Inside CDATA, nothing is parsed as tags.

Example:

<note><![CDATA[
Use < and > freely here. Even <fake tags>.
]]></note>

If ]]> is missing, CDATA runs to end-of-document (recovered).

(If you prefer simpler escaping, you can also support \< and \> as literals.)

Using the Rust crate

use kindaxml::{parse, ParserConfig, UnknownMode};

fn main() {
    let mut cfg = ParserConfig::default();
    cfg.recognized_tags = ["cite", "note"].into_iter().map(String::from).collect();
    cfg.case_sensitive_tags = false;
    cfg.unknown_mode = UnknownMode::Strip;

    let input = "We shipped <cite id=1>last week</cite>.";
    let parsed = parse(input, &cfg);

    for segment in parsed.segments {
        println!("{:?} -> {:?}", segment.text, segment.annotations);
    }
}

Python bindings

The Python module is built with maturin (--features python). Basic usage:

from kindaxml import parse

result = parse("We shipped <cite id=1>last week</cite>.")
print(result.text)

To customize parsing, pass a ParserConfig:

from kindaxml import parse, ParserConfig

cfg = ParserConfig()
cfg.set_recognized_tags(["cite", "note", "todo"])
cfg.set_unknown_mode("strip")  # or passthrough / treat_as_text
cfg.set_recovery_strategy("cite", "retro_line")
cfg.set_autoclose_on_any_tag(True)

result = parse("We shipped <cite id=1>last week</cite>.", cfg)

ParserConfig setters roughly mirror the Rust config: per-tag recovery strategies (retro_line, forward_until_tag, forward_until_newline, forward_next_token, noop), punctuation trimming, auto-close toggles, and case sensitivity.

Full Python configuration example

from kindaxml import parse, ParserConfig

cfg = ParserConfig()
# Only these tags are recognized
cfg.set_recognized_tags(["cite", "note", "risk", "todo"])

# Unknown tags: remove markup but keep inner text
cfg.set_unknown_mode("strip")

# Recovery strategies per tag
cfg.set_recovery_strategy("cite", "retro_line")          # attach backward on the line
cfg.set_recovery_strategy("note", "forward_until_newline")
cfg.set_recovery_strategy("risk", "forward_next_token")

# Auto-close behaviour
cfg.set_autoclose_on_any_tag(True)    # close open tag when any new tag starts
cfg.set_autoclose_on_same_tag(True)   # close when the same tag reappears

# Misc toggles
cfg.set_trim_punctuation(True)        # trim punctuation for retro spans
cfg.set_case_sensitive_tags(False)    # treat tags case-insensitively

text = "We shipped last week <cite id=1>. Risks: <risk level=high> perf"
parsed = parse(text, cfg)

print(parsed.text)  # tag-stripped text
for seg in parsed.segments:
    print(seg, seg.annotations)
for marker in parsed.markers:
    print(marker)

ParserConfig exposes toggles for unknown tags, per-tag recovery strategies, case sensitivity, punctuation trimming, and auto-close behavior. The default config is conservative and strips unknown tags.

Examples

Run the runnable demo with cargo run --example basic to see the original snippets alongside their parsed segments and markers.

Closed tag (inline span)

Input:

We shipped <cite id="1">last week</cite>.

Output (conceptual):

  • We shipped (no annotations)
  • last week (annotated: cite{id=1})
  • . (no annotations)

Unclosed cite (retro_line)

Input:

We shipped last week <cite id=1>.

Output:

  • We shipped last week (annotated: cite{id=1})
  • .
  • (tag removed)

Broken quote recovery

Input:

<cite id='1, 2>Evidence</cite>

Recovered as id="1,2".

Auto-close on next tag

Input:

alpha <note>bravo <cite id=9> charlie
  • <note> auto-closes before <cite ...>
  • <cite> is unclosed and recovered by its strategy

Failure cases / limitations (by design)

Nesting will not behave like XML

KindaXML is not a DOM language. If you try to nest, the “auto-close on next tag” rule will flatten it.

Bad idea:

<A>outer <B>inner</B> outer</A>

KindaXML outcome: <A> likely ends before <B>, and </A> may become stray.

Guidance: don’t nest; prefer sibling tags.

Attribute ambiguity in severely malformed tags

Example:

<tag a="x y z b=2>

KindaXML will recover by closing the quote at > and treat the entire remaining text as part of a. This is intentional: recovery is bounded to the tag.

Guidance: keep attributes simple; use CDATA for messy text.

Stray end tags

Because auto-close flattens structure, you may get stray </tag>. By default, recognized stray end tags are dropped; unknown ones can be passed through (configurable).

Recommended prompting style for LLMs

Tell the model:

  • Use only these tags: <cite> <note> <todo> <risk> ... (whitelist)
  • Do not nest tags
  • Prefer postfix citations: ... statement <cite id=1>.
  • Use CDATA for code or text with </>: <![CDATA[ ... ]]>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kindaxml-0.1.0.tar.gz (30.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

kindaxml-0.1.0-cp312-cp312-win_amd64.whl (184.7 kB view details)

Uploaded CPython 3.12Windows x86-64

kindaxml-0.1.0-cp312-cp312-win32.whl (172.8 kB view details)

Uploaded CPython 3.12Windows x86

kindaxml-0.1.0-cp312-cp312-musllinux_1_2_x86_64.whl (502.0 kB view details)

Uploaded CPython 3.12musllinux: musl 1.2+ x86-64

kindaxml-0.1.0-cp312-cp312-musllinux_1_2_i686.whl (532.3 kB view details)

Uploaded CPython 3.12musllinux: musl 1.2+ i686

kindaxml-0.1.0-cp312-cp312-musllinux_1_2_armv7l.whl (600.4 kB view details)

Uploaded CPython 3.12musllinux: musl 1.2+ ARMv7l

kindaxml-0.1.0-cp312-cp312-musllinux_1_2_aarch64.whl (508.3 kB view details)

Uploaded CPython 3.12musllinux: musl 1.2+ ARM64

kindaxml-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (334.1 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

kindaxml-0.1.0-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl (355.7 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ s390x

kindaxml-0.1.0-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (459.9 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ppc64le

kindaxml-0.1.0-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (333.6 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ARMv7l

kindaxml-0.1.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (326.7 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ARM64

kindaxml-0.1.0-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl (354.3 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.5+ i686

kindaxml-0.1.0-cp312-cp312-macosx_11_0_arm64.whl (288.7 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file kindaxml-0.1.0.tar.gz.

File metadata

  • Download URL: kindaxml-0.1.0.tar.gz
  • Upload date:
  • Size: 30.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.10.2

File hashes

Hashes for kindaxml-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d6b5120fa7f53b15d27633ad8ffca121180777f0832e670c704a2ec225d3858b
MD5 b91f8c9520e42d572a79ef381f16fe99
BLAKE2b-256 7d8020688bf8447418007fbd1ebee5a608be60459687eca8b32ac7ff489ec01c

See more details on using hashes here.

File details

Details for the file kindaxml-0.1.0-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for kindaxml-0.1.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 23d1a0f3237ba24cc308e082013384e90645cae47e552d23af36e820a3c910a5
MD5 be8e85600c66bb5cbdd666d2542ebe7d
BLAKE2b-256 8cfe782cc0ca0453bf098d01e3227060a485a00299d6e741d8462c7c4a3d0298

See more details on using hashes here.

File details

Details for the file kindaxml-0.1.0-cp312-cp312-win32.whl.

File metadata

  • Download URL: kindaxml-0.1.0-cp312-cp312-win32.whl
  • Upload date:
  • Size: 172.8 kB
  • Tags: CPython 3.12, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.10.2

File hashes

Hashes for kindaxml-0.1.0-cp312-cp312-win32.whl
Algorithm Hash digest
SHA256 0f2d6faa023b0fdc138f64a5d9ec1141197b010f552d3081f2c2c8354e7f38d5
MD5 71b4aa54c921949090d25af80161e5d3
BLAKE2b-256 c90e62d0421d4778be3f1877e8f017e2c8ff1fe5f5fb8c31c85e6ce78f2f800c

See more details on using hashes here.

File details

Details for the file kindaxml-0.1.0-cp312-cp312-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for kindaxml-0.1.0-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 75ea0a0359e0d465a3390be130a788284d235d4f9ac0b6806db9d8c0fb6eb5c8
MD5 2b390adad5674cc7b683d693207769dc
BLAKE2b-256 829370fee21a524f7ee0a816af6afb0ab490e1e51013ad46557d36d5a6493c6a

See more details on using hashes here.

File details

Details for the file kindaxml-0.1.0-cp312-cp312-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for kindaxml-0.1.0-cp312-cp312-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 2b04b6f13fe71c013f2abc25d12b27ff542b52e121b38ba74f85c948e4b2d08a
MD5 b9ab39504e7dfb6871ecc26ffcfb10c0
BLAKE2b-256 08d44d4993a430c0f4ba33947761076c1164242d7c54466783a9da9fcb5cdda0

See more details on using hashes here.

File details

Details for the file kindaxml-0.1.0-cp312-cp312-musllinux_1_2_armv7l.whl.

File metadata

File hashes

Hashes for kindaxml-0.1.0-cp312-cp312-musllinux_1_2_armv7l.whl
Algorithm Hash digest
SHA256 3c44c3c8ef8268eda1ce85b887675cce85f110d84157830d0ae9d8c359689e3e
MD5 52b1b27533489d5f18e7f13247ebec78
BLAKE2b-256 20e4250312a6aaf9cad91e86e0ce70a7127fb19d56d4d4dbe3fc6a9d7f54d5ab

See more details on using hashes here.

File details

Details for the file kindaxml-0.1.0-cp312-cp312-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for kindaxml-0.1.0-cp312-cp312-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 3b63ea4401158e8903415a765be3cf7d755a778f0f5c6ee75de7fc5c6e3293b0
MD5 5c9e96dd1629cab1020e473ce004f413
BLAKE2b-256 1c8f6dba425ddfa0ac86cb11fde942e9f66cb1b6318f92910c52c3e67c3360bd

See more details on using hashes here.

File details

Details for the file kindaxml-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for kindaxml-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 332d9253001ffbbc5055e73485bf8bba5a578fbd9abdeb16c68df9a5ef534df5
MD5 679f776284980decab0a6861e3221d2a
BLAKE2b-256 7f5843dbbcfa3b29d38c8d751a854790a0fde91ce303fe672a81796986530188

See more details on using hashes here.

File details

Details for the file kindaxml-0.1.0-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for kindaxml-0.1.0-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 7edd38bcacb656de4a257fb9f03017d3f38a001a3b80198a2055593d5a54503b
MD5 c50f8ef2bf9b7b542b5c3b835b725681
BLAKE2b-256 6b5ce991f87489757421858dd8fed6af69c49f44e3cdb7be47b99ec5255300b8

See more details on using hashes here.

File details

Details for the file kindaxml-0.1.0-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for kindaxml-0.1.0-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 8025cefa4da71d87be4fb4c2065693ba0dd3f736398fb2cc9129a2a79772e2f3
MD5 0d6060c8966f02d341abfe49402623c0
BLAKE2b-256 32029d5461b449633942df3e45f80ad99ecc0e93bf3b2916a7e6780b096d3b24

See more details on using hashes here.

File details

Details for the file kindaxml-0.1.0-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for kindaxml-0.1.0-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 01956a3ae5f342c8b7a71fcf746de243a9306646ebbd38b3924c5db3728183fd
MD5 ff4eee437124c4c53f53ea16ffd19cc5
BLAKE2b-256 3b3d541027d8e5f85979ce657edbc49e22d6091652c126695779f330651f57c6

See more details on using hashes here.

File details

Details for the file kindaxml-0.1.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for kindaxml-0.1.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 6e165ebcb625c4c923106cad206955b302cada441b56f4eef9dd2ab30dec737a
MD5 a4d17a63914089ab578193ea8c39ee33
BLAKE2b-256 8b9745272949ba610f5314bb91309496e16b7020c82a8ec098d8cb321ec25f8b

See more details on using hashes here.

File details

Details for the file kindaxml-0.1.0-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl.

File metadata

File hashes

Hashes for kindaxml-0.1.0-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl
Algorithm Hash digest
SHA256 9465d2ab0b7b0219a5b7f639edd538651d92c0c1e1c55772129da081569d0652
MD5 b15199e0d37ad3702dc0f7fb429c3060
BLAKE2b-256 87223008baea3f6828c1652cbbb5c51e146fe36d00c409675dec175de04861ab

See more details on using hashes here.

File details

Details for the file kindaxml-0.1.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kindaxml-0.1.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 879ce5c2dca35e9fa8106247898e91bd0a3c6eecb73fe61cefea886e386d863a
MD5 e15cf41a287662e5e4c16e1c180edd14
BLAKE2b-256 e5f1729f9baed594dbefb8fb7ed1ddee9d9ed7a2a926467cf0782d64332d82bd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page