Secure-by-default XML repair library for LLM-generated XML with trust levels
Project description
Xenon
Xenon is a robust, zero-dependency Python library designed to clean up, repair, and secure malformed XML generated by Large Language Models (LLMs).
In the era of RAG and AI agents, applications increasingly rely on structured data outputs. However, LLMs often generate messy XML—missing closing tags, conversational fluff, hallucinations, or XSS vulnerabilities. Xenon bridges the gap between raw LLM output and reliable application code.
🚀 Key Features
- LLM-Focused Repair: specifically handles truncation, hallucinations, and conversational text ("Here is your XML: ...").
- Secure by Default: Explicit
TrustLevelsystem to prevent XSS and injection attacks from untrusted sources. - Real-Time Streaming: Repair XML token-by-token as it arrives from the LLM (compatible with OpenAI/Anthropic streams).
- Zero Runtime Dependencies: Core functionality uses only the Python Standard Library. (Optional
lxmlsupport for schema validation). - Smart Matching: Uses Levenshtein distance to fix typoed tags (e.g.,
</usre>→</user>). - Formatting & Diffs: Built-in pretty-printing and diff generation to visualize repairs.
🤝 Complementary to Standard Parsers
Xenon is designed to work alongside established XML parsers like lxml or BeautifulSoup, not replace them.
Standard parsers expect well-formed input and will rightfully raise errors on the chaotic output often generated by LLMs (hallucinations, cut-off strings, missing tags). Xenon acts as a stabilization layer: it accepts raw, potentially malformed LLM output and produces a valid, secure XML string that standard parsers can then consume safely and reliably.
Think of Xenon as the pre-processor that ensures your data pipeline remains robust, even when the LLM output isn't.
Installation
pip install elemental-xenon
For development:
git clone https://github.com/MarsZDF/xenon.git
cd xenon
pip install -e ".[dev]"
⚡ Quick Start
Xenon requires you to specify a Trust Level for your input. This ensures you don't accidentally expose your application to security threats from untrusted LLM outputs.
from xenon import repair_xml_safe, parse_xml_safe, TrustLevel
# 1. Repair malformed LLM output
llm_output = 'Sure! <root><user name=john>Hello'
repaired = repair_xml_safe(llm_output, trust=TrustLevel.UNTRUSTED)
print(repaired)
# Output: <root><user name="john">Hello</user></root>
# 2. Parse directly to a dictionary
data = parse_xml_safe(llm_output, trust=TrustLevel.UNTRUSTED)
print(data)
# Output: {'root': {'user': {'@attributes': {'name': 'john'}, '#text': 'Hello'}}}
📚 API Reference
| Function | Description | Key Arguments |
|---|---|---|
repair_xml_safe |
Core function. Returns repaired XML string. | xml_input, trust, format_output |
parse_xml_safe |
Repairs and converts to Python dict. | xml_input, trust |
StreamingXMLRepair |
Context manager for streaming processing. | trust |
format_xml |
Utility to pretty-print or minify XML. | xml_string, style |
Trust Levels
| Level | Use Case | Security Features |
|---|---|---|
UNTRUSTED |
LLM output, user uploads | 🔒 Strict escaping, strips dangerous tags (script/iframe), prevents XXE & DoS. |
INTERNAL |
Internal microservices | 🔐 Balanced protection, higher depth limits. |
TRUSTED |
Hardcoded strings, tests | ⚡ No overhead, fastest performance. |
🌊 Real-Time Streaming
For RAG pipelines and chat interfaces, you can repair XML as it is being generated, reducing latency to near zero. Xenon handles the chunking logic for you.
from xenon.streaming import StreamingXMLRepair
from xenon import TrustLevel
# Works with any iterator (sync or async)
def llm_stream():
yield "Here is the data:\n<use"
yield "r id=1>Al"
yield "ice</user>"
with StreamingXMLRepair(trust=TrustLevel.UNTRUSTED) as repairer:
for chunk in llm_stream():
# Yields safe, valid XML fragments immediately
for safe_fragment in repairer.feed(chunk):
print(safe_fragment, end="")
# Output: <user id="1">Alice</user>
🔗 Integrations
LangChain
Xenon provides a drop-in XenonXMLOutputParser for LangChain pipelines.
from xenon.integrations.langchain import XenonXMLOutputParser
from xenon import TrustLevel
# Create the parser
parser = XenonXMLOutputParser(
trust=TrustLevel.UNTRUSTED,
return_dict=True # Returns dict, set False for string
)
# Use in your chain
chain = prompt | llm | parser
result = chain.invoke({"query": "Generate user XML"})
🛠️ Common Repair Scenarios
Xenon automatically handles the most common LLM failure modes:
1. Truncation / Cut-off
# Input: <root><list><item>Item 1
# Output: <root><list><item>Item 1</item></list></root>
2. Conversational Fluff
# Input: Here is the XML: <data>value</data> Hope that helps!
# Output: <data>value</data>
3. Malformed Attributes
# Input: <user name=john age=25>
# Output: <user name="john" age="25"></user>
4. Unescaped Entities
# Input: <text>Barnes & Noble</text>
# Output: <text>Barnes & Noble</text>
5. Hallucinated/Invalid Tags
# Input: <123tag>data</123tag>
# Config: sanitize_invalid_tags=True
# Output: <tag_123tag>data</tag_123tag>
📊 Analysis & Formatting
Diff Reporting
See exactly what the repair engine changed.
from xenon import repair_xml_with_report, TrustLevel
xml = "<root><item>test"
repaired, report = repair_xml_with_report(xml, trust=TrustLevel.UNTRUSTED)
print(report.summary())
# Performed 1 repair(s):
# - [truncation] Added closing tags for: item, root
Formatting
Pretty-print or minify your XML.
from xenon import format_xml
xml = "<root><item>val</item></root>"
print(format_xml(xml, style="pretty"))
# <root>
# <item>val</item>
# </root>
Advanced Configuration
For specific needs, you can override security or repair settings individually:
from xenon import repair_xml_safe, TrustLevel
repaired = repair_xml_safe(
"<root><script>alert(1)</script></root>",
trust=TrustLevel.UNTRUSTED,
# Overrides:
strip_dangerous_tags=False, # Keep <script> tags (Use with caution!)
format_output="pretty", # Auto-format result
schema_content=my_xsd_schema # Validate against XSD schema
)
Audit Logging
For enterprise use cases requiring traceability (SOC 2, ISO 27001), Xenon can log security events:
from xenon.audit import AuditLogger
# Configure logger
logger = AuditLogger()
# Usage
repair_xml_safe(
untrusted_input,
trust=TrustLevel.UNTRUSTED,
audit_logger=logger
)
# Export logs
logs = logger.to_json()
# [
# {
# "timestamp": "2023-10-27T...",
# "threats_detected": ["dangerous_pi"],
# "actions_taken": ["DANGEROUS_PI_STRIPPED: Removed..."],
# ...
# }
# ]
Interactive Demo
Try Xenon in your browser with our Google Colab notebook:
License
MIT License. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file elemental_xenon-1.1.0.tar.gz.
File metadata
- Download URL: elemental_xenon-1.1.0.tar.gz
- Upload date:
- Size: 90.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
51f4dccad90cd5b19c7839a66efaa04ca1b8c2ecdb7dc454a431e419b3c07e13
|
|
| MD5 |
9e82baed7bdd5f2a96591b5cf67e4075
|
|
| BLAKE2b-256 |
7fb1c54802f5286749ea0dc47fec97fcd1015e95fc6df6575d0bf40a154df7db
|
File details
Details for the file elemental_xenon-1.1.0-py3-none-any.whl.
File metadata
- Download URL: elemental_xenon-1.1.0-py3-none-any.whl
- Upload date:
- Size: 60.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aad0ba37659186eee5b8d6613ace0ef9c7e46ffafc56cc483605182c6b84956c
|
|
| MD5 |
5618f2bf1592cfa9debbca3a5ba637b6
|
|
| BLAKE2b-256 |
41cbb9ea5cfb3d073b444da2d6bfcb6ffae6e470a953a2003997ac7c8bcdce50
|