Skip to main content

Secure-by-default XML repair library for LLM-generated XML with trust levels

Project description

Xenon

CI PyPI version

Xenon is a robust, zero-dependency Python library designed to clean up, repair, and secure malformed XML generated by Large Language Models (LLMs).

In the era of RAG and AI agents, applications increasingly rely on structured data outputs. However, LLMs often generate messy XML—missing closing tags, conversational fluff, hallucinations, or XSS vulnerabilities. Xenon bridges the gap between raw LLM output and reliable application code.

🚀 Key Features

  • LLM-Focused Repair: specifically handles truncation, hallucinations, and conversational text ("Here is your XML: ...").
  • Secure by Default: Explicit TrustLevel system to prevent XSS and injection attacks from untrusted sources.
  • Real-Time Streaming: Repair XML token-by-token as it arrives from the LLM (compatible with OpenAI/Anthropic streams).
  • Zero Runtime Dependencies: Core functionality uses only the Python Standard Library. (Optional lxml support for schema validation).
  • Smart Matching: Uses Levenshtein distance to fix typoed tags (e.g., </usre></user>).
  • Formatting & Diffs: Built-in pretty-printing and diff generation to visualize repairs.

🤝 Complementary to Standard Parsers

Xenon is designed to work alongside established XML parsers like lxml or BeautifulSoup, not replace them.

Standard parsers expect well-formed input and will rightfully raise errors on the chaotic output often generated by LLMs (hallucinations, cut-off strings, missing tags). Xenon acts as a stabilization layer: it accepts raw, potentially malformed LLM output and produces a valid, secure XML string that standard parsers can then consume safely and reliably.

Think of Xenon as the pre-processor that ensures your data pipeline remains robust, even when the LLM output isn't.

Installation

pip install elemental-xenon

For development:

git clone https://github.com/MarsZDF/xenon.git
cd xenon
pip install -e ".[dev]"

⚡ Quick Start

Xenon requires you to specify a Trust Level for your input. This ensures you don't accidentally expose your application to security threats from untrusted LLM outputs.

from xenon import repair_xml_safe, parse_xml_safe, TrustLevel

# 1. Repair malformed LLM output
llm_output = 'Sure! <root><user name=john>Hello'
repaired = repair_xml_safe(llm_output, trust=TrustLevel.UNTRUSTED)
print(repaired)
# Output: <root><user name="john">Hello</user></root>

# 2. Parse directly to a dictionary
data = parse_xml_safe(llm_output, trust=TrustLevel.UNTRUSTED)
print(data)
# Output: {'root': {'user': {'@attributes': {'name': 'john'}, '#text': 'Hello'}}}

📚 API Reference

Function Description Key Arguments
repair_xml_safe Core function. Returns repaired XML string. xml_input, trust, format_output
parse_xml_safe Repairs and converts to Python dict. xml_input, trust
StreamingXMLRepair Context manager for streaming processing. trust
format_xml Utility to pretty-print or minify XML. xml_string, style

Trust Levels

Level Use Case Security Features
UNTRUSTED LLM output, user uploads 🔒 Strict escaping, strips dangerous tags (script/iframe), prevents XXE & DoS.
INTERNAL Internal microservices 🔐 Balanced protection, higher depth limits.
TRUSTED Hardcoded strings, tests ⚡ No overhead, fastest performance.

🌊 Real-Time Streaming

For RAG pipelines and chat interfaces, you can repair XML as it is being generated, reducing latency to near zero. Xenon handles the chunking logic for you.

from xenon.streaming import StreamingXMLRepair
from xenon import TrustLevel

# Works with any iterator (sync or async)
def llm_stream():
    yield "Here is the data:\n<use"
    yield "r id=1>Al"
    yield "ice</user>"

with StreamingXMLRepair(trust=TrustLevel.UNTRUSTED) as repairer:
    for chunk in llm_stream():
        # Yields safe, valid XML fragments immediately
        for safe_fragment in repairer.feed(chunk):
            print(safe_fragment, end="")

# Output: <user id="1">Alice</user>

🔗 Integrations

LangChain

Xenon provides a drop-in XenonXMLOutputParser for LangChain pipelines.

from xenon.integrations.langchain import XenonXMLOutputParser
from xenon import TrustLevel

# Create the parser
parser = XenonXMLOutputParser(
    trust=TrustLevel.UNTRUSTED,
    return_dict=True  # Returns dict, set False for string
)

# Use in your chain
chain = prompt | llm | parser
result = chain.invoke({"query": "Generate user XML"})

🛠️ Common Repair Scenarios

Xenon automatically handles the most common LLM failure modes:

1. Truncation / Cut-off

# Input:  <root><list><item>Item 1
# Output: <root><list><item>Item 1</item></list></root>

2. Conversational Fluff

# Input:  Here is the XML: <data>value</data> Hope that helps!
# Output: <data>value</data>

3. Malformed Attributes

# Input:  <user name=john age=25>
# Output: <user name="john" age="25"></user>

4. Unescaped Entities

# Input:  <text>Barnes & Noble</text>
# Output: <text>Barnes &amp; Noble</text>

5. Hallucinated/Invalid Tags

# Input:  <123tag>data</123tag>
# Config: sanitize_invalid_tags=True
# Output: <tag_123tag>data</tag_123tag>

📊 Analysis & Formatting

Diff Reporting

See exactly what the repair engine changed.

from xenon import repair_xml_with_report, TrustLevel

xml = "<root><item>test"
repaired, report = repair_xml_with_report(xml, trust=TrustLevel.UNTRUSTED)

print(report.summary())
# Performed 1 repair(s):
#   - [truncation] Added closing tags for: item, root

Formatting

Pretty-print or minify your XML.

from xenon import format_xml

xml = "<root><item>val</item></root>"
print(format_xml(xml, style="pretty"))
# <root>
#   <item>val</item>
# </root>

Advanced Configuration

For specific needs, you can override security or repair settings individually:

from xenon import repair_xml_safe, TrustLevel

repaired = repair_xml_safe(
    "<root><script>alert(1)</script></root>",
    trust=TrustLevel.UNTRUSTED,
    # Overrides:
    strip_dangerous_tags=False,  # Keep <script> tags (Use with caution!)
    format_output="pretty",      # Auto-format result
    schema_content=my_xsd_schema # Validate against XSD schema
)

Audit Logging

For enterprise use cases requiring traceability (SOC 2, ISO 27001), Xenon can log security events:

from xenon.audit import AuditLogger

# Configure logger
logger = AuditLogger()

# Usage
repair_xml_safe(
    untrusted_input,
    trust=TrustLevel.UNTRUSTED,
    audit_logger=logger
)

# Export logs
logs = logger.to_json()
# [
#   {
#     "timestamp": "2023-10-27T...",
#     "threats_detected": ["dangerous_pi"],
#     "actions_taken": ["DANGEROUS_PI_STRIPPED: Removed..."],
#     ...
#   }
# ]

Interactive Demo

Try Xenon in your browser with our Google Colab notebook:

Open In Colab

License

MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

elemental_xenon-1.1.0.tar.gz (90.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

elemental_xenon-1.1.0-py3-none-any.whl (60.9 kB view details)

Uploaded Python 3

File details

Details for the file elemental_xenon-1.1.0.tar.gz.

File metadata

  • Download URL: elemental_xenon-1.1.0.tar.gz
  • Upload date:
  • Size: 90.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.1

File hashes

Hashes for elemental_xenon-1.1.0.tar.gz
Algorithm Hash digest
SHA256 51f4dccad90cd5b19c7839a66efaa04ca1b8c2ecdb7dc454a431e419b3c07e13
MD5 9e82baed7bdd5f2a96591b5cf67e4075
BLAKE2b-256 7fb1c54802f5286749ea0dc47fec97fcd1015e95fc6df6575d0bf40a154df7db

See more details on using hashes here.

File details

Details for the file elemental_xenon-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for elemental_xenon-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 aad0ba37659186eee5b8d6613ace0ef9c7e46ffafc56cc483605182c6b84956c
MD5 5618f2bf1592cfa9debbca3a5ba637b6
BLAKE2b-256 41cbb9ea5cfb3d073b444da2d6bfcb6ffae6e470a953a2003997ac7c8bcdce50

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page