Secure-by-default XML repair library for LLM-generated XML with trust levels

These details have not been verified by PyPI

Project links

Project description

Xenon

Xenon is a robust, zero-dependency Python library designed to clean up, repair, and secure malformed XML generated by Large Language Models (LLMs).

In the era of RAG and AI agents, applications increasingly rely on structured data outputs. However, LLMs often generate messy XML—missing closing tags, conversational fluff, hallucinations, or XSS vulnerabilities. Xenon bridges the gap between raw LLM output and reliable application code.

🚀 Key Features

LLM-Focused Repair: specifically handles truncation, hallucinations, and conversational text ("Here is your XML: ...").
Secure by Default: Explicit TrustLevel system to prevent XSS and injection attacks from untrusted sources.
Real-Time Streaming: Repair XML token-by-token as it arrives from the LLM (compatible with OpenAI/Anthropic streams).
Zero Runtime Dependencies: Core functionality uses only the Python Standard Library. (Optional lxml support for schema validation).
Smart Matching: Uses Levenshtein distance to fix typoed tags (e.g., </usre> → </user>).
Formatting & Diffs: Built-in pretty-printing and diff generation to visualize repairs.

🤝 Complementary to Standard Parsers

Xenon is designed to work alongside established XML parsers like lxml or BeautifulSoup, not replace them.

Standard parsers expect well-formed input and will rightfully raise errors on the chaotic output often generated by LLMs (hallucinations, cut-off strings, missing tags). Xenon acts as a stabilization layer: it accepts raw, potentially malformed LLM output and produces a valid, secure XML string that standard parsers can then consume safely and reliably.

Think of Xenon as the pre-processor that ensures your data pipeline remains robust, even when the LLM output isn't.

Installation

pip install elemental-xenon

For development:

git clone https://github.com/MarsZDF/xenon.git
cd xenon
pip install -e ".[dev]"

⚡ Quick Start

Xenon requires you to specify a Trust Level for your input. This ensures you don't accidentally expose your application to security threats from untrusted LLM outputs.

from xenon import repair_xml_safe, parse_xml_safe, TrustLevel

# 1. Repair malformed LLM output
llm_output = 'Sure! <root><user name=john>Hello'
repaired = repair_xml_safe(llm_output, trust=TrustLevel.UNTRUSTED)
print(repaired)
# Output: <root><user name="john">Hello</user></root>

# 2. Parse directly to a dictionary
data = parse_xml_safe(llm_output, trust=TrustLevel.UNTRUSTED)
print(data)
# Output: {'root': {'user': {'@attributes': {'name': 'john'}, '#text': 'Hello'}}}

📚 API Reference

Function	Description	Key Arguments
`repair_xml_safe`	Core function. Returns repaired XML string.	`xml_input`, `trust`, `format_output`
`parse_xml_safe`	Repairs and converts to Python dict.	`xml_input`, `trust`
`StreamingXMLRepair`	Context manager for streaming processing.	`trust`
`format_xml`	Utility to pretty-print or minify XML.	`xml_string`, `style`

Trust Levels

Level	Use Case	Security Features
`UNTRUSTED`	LLM output, user uploads	🔒 Strict escaping, strips dangerous tags (script/iframe), prevents XXE & DoS.
`INTERNAL`	Internal microservices	🔐 Balanced protection, higher depth limits.
`TRUSTED`	Hardcoded strings, tests	⚡ No overhead, fastest performance.

🌊 Real-Time Streaming

For RAG pipelines and chat interfaces, you can repair XML as it is being generated, reducing latency to near zero. Xenon handles the chunking logic for you.

from xenon.streaming import StreamingXMLRepair
from xenon import TrustLevel

# Works with any iterator (sync or async)
def llm_stream():
    yield "Here is the data:\n<use"
    yield "r id=1>Al"
    yield "ice</user>"

with StreamingXMLRepair(trust=TrustLevel.UNTRUSTED) as repairer:
    for chunk in llm_stream():
        # Yields safe, valid XML fragments immediately
        for safe_fragment in repairer.feed(chunk):
            print(safe_fragment, end="")

# Output: <user id="1">Alice</user>

🔗 Integrations

LangChain

Xenon provides a drop-in XenonXMLOutputParser for LangChain pipelines.

from xenon.integrations.langchain import XenonXMLOutputParser
from xenon import TrustLevel

# Create the parser
parser = XenonXMLOutputParser(
    trust=TrustLevel.UNTRUSTED,
    return_dict=True  # Returns dict, set False for string
)

# Use in your chain
chain = prompt | llm | parser
result = chain.invoke({"query": "Generate user XML"})

🛠️ Common Repair Scenarios

Xenon automatically handles the most common LLM failure modes:

1. Truncation / Cut-off

# Input:  <root><list><item>Item 1
# Output: <root><list><item>Item 1</item></list></root>

2. Conversational Fluff

# Input:  Here is the XML: <data>value</data> Hope that helps!
# Output: <data>value</data>

3. Malformed Attributes

# Input:  <user name=john age=25>
# Output: <user name="john" age="25"></user>

4. Unescaped Entities

# Input:  <text>Barnes & Noble</text>
# Output: <text>Barnes &amp; Noble</text>

5. Hallucinated/Invalid Tags

# Input:  <123tag>data</123tag>
# Config: sanitize_invalid_tags=True
# Output: <tag_123tag>data</tag_123tag>

📊 Analysis & Formatting

Diff Reporting

See exactly what the repair engine changed.

from xenon import repair_xml_with_report, TrustLevel

xml = "<root><item>test"
repaired, report = repair_xml_with_report(xml, trust=TrustLevel.UNTRUSTED)

print(report.summary())
# Performed 1 repair(s):
#   - [truncation] Added closing tags for: item, root

Formatting

Pretty-print or minify your XML.

from xenon import format_xml

xml = "<root><item>val</item></root>"
print(format_xml(xml, style="pretty"))
# <root>
#   <item>val</item>
# </root>

Advanced Configuration

For specific needs, you can override security or repair settings individually:

from xenon import repair_xml_safe, TrustLevel

repaired = repair_xml_safe(
    "<root><script>alert(1)</script></root>",
    trust=TrustLevel.UNTRUSTED,
    # Overrides:
    strip_dangerous_tags=False,  # Keep <script> tags (Use with caution!)
    format_output="pretty",      # Auto-format result
    schema_content=my_xsd_schema # Validate against XSD schema
)

Audit Logging

For enterprise use cases requiring traceability (SOC 2, ISO 27001), Xenon can log security events:

from xenon.audit import AuditLogger

# Configure logger
logger = AuditLogger()

# Usage
repair_xml_safe(
    untrusted_input,
    trust=TrustLevel.UNTRUSTED,
    audit_logger=logger
)

# Export logs
logs = logger.to_json()
# [
#   {
#     "timestamp": "2023-10-27T...",
#     "threats_detected": ["dangerous_pi"],
#     "actions_taken": ["DANGEROUS_PI_STRIPPED: Removed..."],
#     ...
#   }
# ]

Interactive Demo

Try Xenon in your browser with our Google Colab notebook:

License

MIT License. See LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.1.0

Dec 8, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

elemental_xenon-1.1.0.tar.gz (90.6 kB view details)

Uploaded Dec 8, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

elemental_xenon-1.1.0-py3-none-any.whl (60.9 kB view details)

Uploaded Dec 8, 2025 Python 3

File details

Details for the file elemental_xenon-1.1.0.tar.gz.

File metadata

Download URL: elemental_xenon-1.1.0.tar.gz
Upload date: Dec 8, 2025
Size: 90.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.1

File hashes

Hashes for elemental_xenon-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`51f4dccad90cd5b19c7839a66efaa04ca1b8c2ecdb7dc454a431e419b3c07e13`
MD5	`9e82baed7bdd5f2a96591b5cf67e4075`
BLAKE2b-256	`7fb1c54802f5286749ea0dc47fec97fcd1015e95fc6df6575d0bf40a154df7db`

See more details on using hashes here.

File details

Details for the file elemental_xenon-1.1.0-py3-none-any.whl.

File metadata

Download URL: elemental_xenon-1.1.0-py3-none-any.whl
Upload date: Dec 8, 2025
Size: 60.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.1

File hashes

Hashes for elemental_xenon-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`aad0ba37659186eee5b8d6613ace0ef9c7e46ffafc56cc483605182c6b84956c`
MD5	`5618f2bf1592cfa9debbca3a5ba637b6`
BLAKE2b-256	`41cbb9ea5cfb3d073b444da2d6bfcb6ffae6e470a953a2003997ac7c8bcdce50`

See more details on using hashes here.

elemental-xenon 1.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Xenon

🚀 Key Features

🤝 Complementary to Standard Parsers

Installation

⚡ Quick Start

📚 API Reference

Trust Levels

🌊 Real-Time Streaming

🔗 Integrations

LangChain

🛠️ Common Repair Scenarios

📊 Analysis & Formatting

Diff Reporting

Formatting

Advanced Configuration

Audit Logging

Interactive Demo

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes