Add your description here
Project description
kindaxml, a close-enough, XML-ish markup for LLM output
KindaXML is an XML-inspired annotation DSL designed for LLM-generated text. It keeps the familiar <tag attr=...> shape, but the parser is tolerant: it recovers from missing end tags, missing quotes, and other common “almost XML” mistakes.
KindaXML is not XML (and not meant to be parsed by strict XML parsers). Think: well-formed-ish.
Why KindaXML?
LLMs are good at emitting XML-like text, but strict XML breaks easily. KindaXML aims to be:
- LLM-friendly: angle brackets and attributes feel natural in prompts.
- Deterministic recovery: malformed input still produces predictable output.
- Annotation-first: tags annotate spans of text rather than building a complex DOM.
- Configurable: recognized tags are whitelisted, unknown tags can be stripped or preserved.
Design: Annotation DSL (Option A) + a pinch of “blocks”
KindaXML’s primary output is a stream of text segments, each optionally annotated:
[
{"text": "We shipped last week", "ann": [{"tag":"cite","attrs":{"id":"1"}}]},
{"text": ". ", "ann": []},
{"text": "Details", "ann": [{"tag":"note","attrs":{}}]}
]
KindaXML intentionally avoids deep nesting. In fact, it auto-closes open tags when the next tag begins, which keeps structures shallow and robust.
Syntax overview
Tags
- Start tag:
<tag ...> - End tag:
</tag> - Self-closing tag:
<tag .../>
Tag names match:
[A-Za-z][A-Za-z0-9_\-:.]*
Attributes
Supported forms:
a="x"a='x'a=x(unquoted)a(boolean attribute; impliestrue)- Whitespace around
=is allowed.
Parsing rules (the “close enough” part)
1) Tag boundary detection
A tag begins at < and ends at the first >.
If a quote starts inside the tag but never closes, it is implicitly closed at >.
Example:
<cite id='1,2>text</cite>
Parses as:
tag = citeid = "1,2"(quote recovered)- inner text =
text
2) Auto-close on encountering another tag
If a start tag is open and the parser encounters the next <something...>, the current tag is implicitly closed immediately before that next <.
This is the core rule that prevents runaway structures.
Example:
<A>hello <B>world</B>
<A> auto-closes before <B>.
3) Missing end tags are tolerated
If a tag never closes, it’s recovered according to its configured span strategy (below).
4) Self-closing tags
<tag .../> is treated as a marker annotation at that position (or optionally “annotate next token”, configurable).
Span strategies (how KindaXML decides what a tag annotates)
KindaXML is annotation-first. Each recognized tag can be configured with a span strategy:
inline (normal XML-ish)
If <tag> ... </tag> is present, annotate the inner range.
retro_line (great for citations)
If <cite ...> is unclosed, annotate the text on the current line before the tag (from last emitted newline to the tag start), optionally trimming punctuation/whitespace.
Example:
We shipped last week <cite id=1>.
The cite attaches to We shipped last week (not the punctuation).
Other useful strategies (optional)
forward_until_tag: annotate from the end of<tag ...>to the next tag start.forward_until_newline: annotate until newline.forward_next_token: annotate the next token/word.noop: ignore tag if unclosed (marker-only tags).
Unknown tags
You instruct the LLM to use a whitelist of recognized tags, but the parser can handle unknown tags in one of three modes:
strip(default-friendly): drop unknown tag markup, keep inner textpassthrough: keep unknown tags as literal texttreat_as_text: don’t parse unknown tags at all; treat<...>as text
Escaping / literal text (CDATA support)
KindaXML can support XML’s CDATA form:
- Start:
<![CDATA[ - End:
]]>
Inside CDATA, nothing is parsed as tags.
Example:
<note><![CDATA[
Use < and > freely here. Even <fake tags>.
]]></note>
If ]]> is missing, CDATA runs to end-of-document (recovered).
(If you prefer simpler escaping, you can also support \< and \> as literals.)
Using the Rust crate
use kindaxml::{parse, ParserConfig, UnknownMode};
fn main() {
let mut cfg = ParserConfig::default();
cfg.recognized_tags = ["cite", "note"].into_iter().map(String::from).collect();
cfg.case_sensitive_tags = false;
cfg.unknown_mode = UnknownMode::Strip;
let input = "We shipped <cite id=1>last week</cite>.";
let parsed = parse(input, &cfg);
for segment in parsed.segments {
println!("{:?} -> {:?}", segment.text, segment.annotations);
}
}
Python bindings
The Python module is built with maturin (--features python). Basic usage:
from kindaxml import parse
result = parse("We shipped <cite id=1>last week</cite>.")
print(result.text)
To customize parsing, pass a ParserConfig:
from kindaxml import parse, ParserConfig
cfg = ParserConfig()
cfg.set_recognized_tags(["cite", "note", "todo"])
cfg.set_unknown_mode("strip") # or passthrough / treat_as_text
cfg.set_recovery_strategy("cite", "retro_line")
cfg.set_autoclose_on_any_tag(True)
result = parse("We shipped <cite id=1>last week</cite>.", cfg)
ParserConfig setters roughly mirror the Rust config: per-tag recovery strategies (retro_line, forward_until_tag, forward_until_newline, forward_next_token, noop), punctuation trimming, auto-close toggles, and case sensitivity.
Full Python configuration example
from kindaxml import parse, ParserConfig
cfg = ParserConfig()
# Only these tags are recognized
cfg.set_recognized_tags(["cite", "note", "risk", "todo"])
# Unknown tags: remove markup but keep inner text
cfg.set_unknown_mode("strip")
# Recovery strategies per tag
cfg.set_recovery_strategy("cite", "retro_line") # attach backward on the line
cfg.set_recovery_strategy("note", "forward_until_newline")
cfg.set_recovery_strategy("risk", "forward_next_token")
# Auto-close behaviour
cfg.set_autoclose_on_any_tag(True) # close open tag when any new tag starts
cfg.set_autoclose_on_same_tag(True) # close when the same tag reappears
# Misc toggles
cfg.set_trim_punctuation(True) # trim punctuation for retro spans
cfg.set_case_sensitive_tags(False) # treat tags case-insensitively
text = "We shipped last week <cite id=1>. Risks: <risk level=high> perf"
parsed = parse(text, cfg)
print(parsed.text) # tag-stripped text
for seg in parsed.segments:
print(seg, seg.annotations)
for marker in parsed.markers:
print(marker)
ParserConfig exposes toggles for unknown tags, per-tag recovery strategies, case sensitivity, punctuation trimming, and auto-close behavior. The default config is conservative and strips unknown tags.
Examples
Run the runnable demo with cargo run --example basic to see the original snippets alongside their parsed segments and markers.
Closed tag (inline span)
Input:
We shipped <cite id="1">last week</cite>.
Output (conceptual):
We shipped(no annotations)last week(annotated: cite{id=1}).(no annotations)
Unclosed cite (retro_line)
Input:
We shipped last week <cite id=1>.
Output:
We shipped last week(annotated: cite{id=1}).- (tag removed)
Broken quote recovery
Input:
<cite id='1, 2>Evidence</cite>
Recovered as id="1,2".
Auto-close on next tag
Input:
alpha <note>bravo <cite id=9> charlie
<note>auto-closes before<cite ...><cite>is unclosed and recovered by its strategy
Failure cases / limitations (by design)
Nesting will not behave like XML
KindaXML is not a DOM language. If you try to nest, the “auto-close on next tag” rule will flatten it.
Bad idea:
<A>outer <B>inner</B> outer</A>
KindaXML outcome: <A> likely ends before <B>, and </A> may become stray.
Guidance: don’t nest; prefer sibling tags.
Attribute ambiguity in severely malformed tags
Example:
<tag a="x y z b=2>
KindaXML will recover by closing the quote at > and treat the entire remaining text as part of a. This is intentional: recovery is bounded to the tag.
Guidance: keep attributes simple; use CDATA for messy text.
Stray end tags
Because auto-close flattens structure, you may get stray </tag>. By default, recognized stray end tags are dropped; unknown ones can be passed through (configurable).
Recommended prompting style for LLMs
Tell the model:
- Use only these tags:
<cite> <note> <todo> <risk> ...(whitelist) - Do not nest tags
- Prefer postfix citations:
... statement <cite id=1>. - Use CDATA for code or text with
</>:<![CDATA[ ... ]]>
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kindaxml-0.1.1.tar.gz.
File metadata
- Download URL: kindaxml-0.1.1.tar.gz
- Upload date:
- Size: 30.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4dbf2f968c1bf331bf015127275fa82aceae8457ec30a524c6bb49bc1392ddfa
|
|
| MD5 |
ea4ffc1de0d734896f75bf2ccb5376ae
|
|
| BLAKE2b-256 |
e305939aa28e81129b7f102d57966c4e2fb5fecb3fa0f5c96bd4cce67bc2df7e
|
File details
Details for the file kindaxml-0.1.1-cp312-cp312-win_amd64.whl.
File metadata
- Download URL: kindaxml-0.1.1-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 184.7 kB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ff11a4a586b588a8a7c719151bbe38507380a5cb933aab94e05d68f88ee3fc1
|
|
| MD5 |
10a88e8fb5086df0f7837528fbc101bf
|
|
| BLAKE2b-256 |
84d0b16ba37546734efb336dbe3dc607ada5c8603e28e2498fbd94ce79926368
|
File details
Details for the file kindaxml-0.1.1-cp312-cp312-win32.whl.
File metadata
- Download URL: kindaxml-0.1.1-cp312-cp312-win32.whl
- Upload date:
- Size: 172.8 kB
- Tags: CPython 3.12, Windows x86
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d1861e2a45cf532dba1b69f14dcd3ddc58815beeb5421ca713159efb66a8164c
|
|
| MD5 |
76b68467b4060b0e97850fac809b3c95
|
|
| BLAKE2b-256 |
5adcc5a98418b9f7570fab3aa759bfb310458c36c08863501a57ff28c793675d
|
File details
Details for the file kindaxml-0.1.1-cp312-cp312-musllinux_1_2_x86_64.whl.
File metadata
- Download URL: kindaxml-0.1.1-cp312-cp312-musllinux_1_2_x86_64.whl
- Upload date:
- Size: 499.4 kB
- Tags: CPython 3.12, musllinux: musl 1.2+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9dcd8206d65723a26c9fe345a3adede76e61bf41b201d2d49a03f4dbc9cc8402
|
|
| MD5 |
110105e8d9133b1c10465aa165b00bf4
|
|
| BLAKE2b-256 |
84c288d6ad0b2749848d37e4b36534468ec49feb95e098993b4e4f299e55f603
|
File details
Details for the file kindaxml-0.1.1-cp312-cp312-musllinux_1_2_i686.whl.
File metadata
- Download URL: kindaxml-0.1.1-cp312-cp312-musllinux_1_2_i686.whl
- Upload date:
- Size: 531.0 kB
- Tags: CPython 3.12, musllinux: musl 1.2+ i686
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d96f4851b9539ff3a191fecaaf21a8ca29022a3a9d48d8a41696256a8f7d1e72
|
|
| MD5 |
fd6059052f71fe2feb38ad72c65151ad
|
|
| BLAKE2b-256 |
98c0b336a50551a1d158eb2f9be0745052ba6d9f1da1ac9b35e3b8f8883fbdb4
|
File details
Details for the file kindaxml-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: kindaxml-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 331.0 kB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
25ba957406437d8bce0e4fb177865303a4518f28878436d51ad1d6f76760c8f8
|
|
| MD5 |
23df0bee808cb43f4df4f0ca358289ce
|
|
| BLAKE2b-256 |
3470cab8c73b89f1600734649d039ff24551218e07a7746a0e70b92ea307081f
|
File details
Details for the file kindaxml-0.1.1-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl.
File metadata
- Download URL: kindaxml-0.1.1-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl
- Upload date:
- Size: 352.7 kB
- Tags: CPython 3.12, manylinux: glibc 2.5+ i686
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
74223b03b6ea3a1d64e56441766fc1e854ddfbe8c880ee06a416d49c7c5f6f0e
|
|
| MD5 |
efdbb2dc6dce481897ceb383d8b4f517
|
|
| BLAKE2b-256 |
749cd4aa5e816a7dd62bb12a5b60d747a132087bec691255d7806145a3b4ca8b
|
File details
Details for the file kindaxml-0.1.1-cp312-cp312-macosx_11_0_arm64.whl.
File metadata
- Download URL: kindaxml-0.1.1-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 288.8 kB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
22daf40ad8d5f629fde8bb0b05232cf7e5f598f0d5997b99394ac7b7d60c15f7
|
|
| MD5 |
eee2e06bf729754878bdf7ec2f846992
|
|
| BLAKE2b-256 |
477dc5ac6bdd6224e56ecbf9df93d0f4824d0e2713143be662ce8c5fc85b50cd
|