Skip to main content

Reliable way to parse LLM outputs

Project description

SafeXMLParser - README

Overview

SafeXMLParser is a Python class designed to provide a safer and fault-tolerant way to parse XML strings. It leverages Large Language Models (LLMs) to correct malformed XML in case the initial parsing fails. This class supports multiple attempts for parsing and logs every parsing attempt, including successful parses, errors, and any LLM-based corrections.

Features

  • Multiple Attempts: Provides the option to specify multiple parsing attempts to handle malformed XML.
  • LLM-based Correction: Uses a specified LLM model to attempt XML correction if parsing fails.
  • Logging: Records all attempts, including input, output, errors, and LLM correction details.
  • Flexible Configuration: Customizable LLM model and number of attempts for robust XML parsing.

Installation

  1. Clone this repository or download the code.
  2. Install the required dependencies (e.g., beautifulsoup4 or any LLM model dependencies):
    pip install beautifulsoup4
    

Example Usage

1. Importing the Class

First, import the SafeXMLParser class and any other necessary components:

from safe_xml_parser import SafeXMLParser  # Example import path

2. Basic Usage (Single Parsing Attempt)

Here is an example of how to use SafeXMLParser for a basic XML parsing operation without a fallback model:

# Example XML string
xml_data = "<root><child>data</child></root>"

# Initialize the parser
parser = SafeXMLParser()

# Attempt to parse the XML data (one attempt, no LLM correction)
try:
    parsed_data = parser.safe_parse(xml_data)
    print("Parsed Data:", parsed_data)
except Exception as e:
    print(f"Parsing failed: {e}")

# Output: {'root': {'child': 'data'}}

3. Multiple Attempts with LLM Correction

If you are dealing with malformed XML, you can provide a custom LLM model to correct the data between attempts:

def fallback_correction(text):
    # Simple function to simulate fixing the broken XML
    return text.replace("<broken>", "<child>").replace("</broken>", "</child>")

# Malformed XML string
malformed_xml = "<root><broken>data</root>"

# Initialize the parser with the fallback correction model
parser = SafeXMLParser(default_llm_model=fallback_correction, default_nb_attempts=2)

# Attempt to parse the malformed XML data
try:
    parsed_data = parser.safe_parse(malformed_xml)
    print("Parsed Data:", parsed_data)
except Exception as e:
    print(f"Parsing failed: {e}")

# Output: {'root': {'child': 'data'}}

4. Accessing Parsing Logs

Logs are available for every parsing attempt, showing the input, output, error messages, and any LLM-based corrections applied:

# Access logs after parsing
logs = parser.logs()
for log in logs:
    print(log)

Logs provide insights into each step of the parsing process, including what the LLM model was prompted with and what corrections it made.

5. Dynamic Configuration of LLM and Attempts

You can also configure the LLM model and number of attempts dynamically during parsing:

# Initialize parser without setting defaults
parser = SafeXMLParser()

# Dynamically pass custom LLM model and attempts
try:
    parsed_data = parser.safe_parse(
        malformed_xml,
        nb_attempts=3,
        llm_model=fallback_correction
    )
    print("Parsed Data:", parsed_data)
except Exception as e:
    print(f"Parsing failed: {e}")

Method Summary

1. safe_parse(text_to_parse: str, nb_attempts: Optional[int] = None, llm_model: Optional[Callable] = None, correctness_prompt_maker: Callable = create_fix_xml_prompt) -> Dict[str, Union[Dict, str]]

  • Description: Attempts to parse the XML string multiple times, with the option of using an LLM model to correct any errors between attempts.
  • Args:
    • text_to_parse: The XML string to be parsed.
    • nb_attempts: The number of attempts allowed for parsing (default: 1).
    • llm_model: The function used to correct XML between attempts (default: None).
    • correctness_prompt_maker: A function that creates prompts for LLM correction (default: create_fix_xml_prompt).
  • Returns: A dictionary representation of the parsed XML.
  • Raises: Raises an Exception if all attempts fail.

2. logs(timestamp: bool = True) -> List[Dict[str, str]]

  • Description: Returns logs of each parsing attempt, with an option to include/exclude timestamps.
  • Args:
    • timestamp: Whether to include timestamps in the logs (default: True).
  • Returns: A list of dictionaries containing details of each parsing attempt.

Logging Structure

Logs include the following information:

  • Input: The XML string that was parsed.
  • Output: The resulting parsed output (if successful) or "N/A".
  • Error: Any error encountered during parsing, or "N/A" if successful.
  • Correctness Prompt: The prompt sent to the LLM for correction (if applicable).
  • Correctness Output: The corrected output from the LLM model (if applicable).

Handling Edge Cases

  • If parsing fails after all attempts, the parser raises an exception.
  • The LLM model can be customized to handle different error types or malformed XML structures.

Conclusion

The SafeXMLParser class offers a robust and flexible solution for parsing XML data, with built-in fault tolerance through LLM-based correction and detailed logging for easier debugging. This class is ideal for scenarios where XML data may be incomplete or malformed and multiple attempts are needed to ensure successful parsing.

License

[Include any licensing details here.]


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

safe_llm_parser-0.1.0.tar.gz (7.0 kB view details)

Uploaded Source

Built Distribution

safe_llm_parser-0.1.0-py3-none-any.whl (8.1 kB view details)

Uploaded Python 3

File details

Details for the file safe_llm_parser-0.1.0.tar.gz.

File metadata

  • Download URL: safe_llm_parser-0.1.0.tar.gz
  • Upload date:
  • Size: 7.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.8.1 Windows/10

File hashes

Hashes for safe_llm_parser-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ce87129a1f70e3923abc59a0b1fdd46481de0693ba600f4796a25c54c5cfb1fa
MD5 7aca2b18b7fddc6a42f12d75fbc840c5
BLAKE2b-256 27ecf97bb820e2562171a14e675575dc92e2ffd67ec9f3396b8e9f2023ded704

See more details on using hashes here.

File details

Details for the file safe_llm_parser-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for safe_llm_parser-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f8b5b6905ecf7a4e532674a49fd757e1a0fdac360982af3a5b2e9edbaaa7c252
MD5 9f5025ddf92d19a61347b8d8e97f7b1e
BLAKE2b-256 e474db9632fc3d6e7df82a87b9892bf9ccc57ae42b7d3ecf883da174445c1307

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page