Reliable way to parse LLM outputs

These details have not been verified by PyPI

Project description

SafeXMLParser - README

Overview

SafeXMLParser is a Python class designed to provide a safer and fault-tolerant way to parse XML strings. It leverages Large Language Models (LLMs) to correct malformed XML in case the initial parsing fails. This class supports multiple attempts for parsing and logs every parsing attempt, including successful parses, errors, and any LLM-based corrections.

Features

Multiple Attempts: Provides the option to specify multiple parsing attempts to handle malformed XML.
LLM-based Correction: Uses a specified LLM model to attempt XML correction if parsing fails.
Logging: Records all attempts, including input, output, errors, and LLM correction details.
Flexible Configuration: Customizable LLM model and number of attempts for robust XML parsing.

Installation

Clone this repository or download the code.
Install the required dependencies (e.g., beautifulsoup4 or any LLM model dependencies):
```
pip install beautifulsoup4
```

Example Usage

1. Importing the Class

First, import the SafeXMLParser class and any other necessary components:

from safe_xml_parser import SafeXMLParser  # Example import path

2. Basic Usage (Single Parsing Attempt)

Here is an example of how to use SafeXMLParser for a basic XML parsing operation without a fallback model:

# Example XML string
xml_data = "<root><child>data</child></root>"

# Initialize the parser
parser = SafeXMLParser()

# Attempt to parse the XML data (one attempt, no LLM correction)
try:
    parsed_data = parser.safe_parse(xml_data)
    print("Parsed Data:", parsed_data)
except Exception as e:
    print(f"Parsing failed: {e}")

# Output: {'root': {'child': 'data'}}

3. Multiple Attempts with LLM Correction

If you are dealing with malformed XML, you can provide a custom LLM model to correct the data between attempts:

def fallback_correction(text):
    # Simple function to simulate fixing the broken XML
    return text.replace("<broken>", "<child>").replace("</broken>", "</child>")

# Malformed XML string
malformed_xml = "<root><broken>data</root>"

# Initialize the parser with the fallback correction model
parser = SafeXMLParser(default_llm_model=fallback_correction, default_nb_attempts=2)

# Attempt to parse the malformed XML data
try:
    parsed_data = parser.safe_parse(malformed_xml)
    print("Parsed Data:", parsed_data)
except Exception as e:
    print(f"Parsing failed: {e}")

# Output: {'root': {'child': 'data'}}

4. Accessing Parsing Logs

Logs are available for every parsing attempt, showing the input, output, error messages, and any LLM-based corrections applied:

# Access logs after parsing
logs = parser.logs()
for log in logs:
    print(log)

Logs provide insights into each step of the parsing process, including what the LLM model was prompted with and what corrections it made.

5. Dynamic Configuration of LLM and Attempts

You can also configure the LLM model and number of attempts dynamically during parsing:

# Initialize parser without setting defaults
parser = SafeXMLParser()

# Dynamically pass custom LLM model and attempts
try:
    parsed_data = parser.safe_parse(
        malformed_xml,
        nb_attempts=3,
        llm_model=fallback_correction
    )
    print("Parsed Data:", parsed_data)
except Exception as e:
    print(f"Parsing failed: {e}")

Method Summary

1. `safe_parse(text_to_parse: str, nb_attempts: Optional[int] = None, llm_model: Optional[Callable] = None, correctness_prompt_maker: Callable = create_fix_xml_prompt) -> Dict[str, Union[Dict, str]]`

Description: Attempts to parse the XML string multiple times, with the option of using an LLM model to correct any errors between attempts.
Args:
- text_to_parse: The XML string to be parsed.
- nb_attempts: The number of attempts allowed for parsing (default: 1).
- llm_model: The function used to correct XML between attempts (default: None).
- correctness_prompt_maker: A function that creates prompts for LLM correction (default: create_fix_xml_prompt).
Returns: A dictionary representation of the parsed XML.
Raises: Raises an Exception if all attempts fail.

2. `logs(timestamp: bool = True) -> List[Dict[str, str]]`

Description: Returns logs of each parsing attempt, with an option to include/exclude timestamps.
Args:
- timestamp: Whether to include timestamps in the logs (default: True).
Returns: A list of dictionaries containing details of each parsing attempt.

Logging Structure

Logs include the following information:

Input: The XML string that was parsed.
Output: The resulting parsed output (if successful) or "N/A".
Error: Any error encountered during parsing, or "N/A" if successful.
Correctness Prompt: The prompt sent to the LLM for correction (if applicable).
Correctness Output: The corrected output from the LLM model (if applicable).

Handling Edge Cases

If parsing fails after all attempts, the parser raises an exception.
The LLM model can be customized to handle different error types or malformed XML structures.

Conclusion

The SafeXMLParser class offers a robust and flexible solution for parsing XML data, with built-in fault tolerance through LLM-based correction and detailed logging for easier debugging. This class is ideal for scenarios where XML data may be incomplete or malformed and multiple attempts are needed to ensure successful parsing.

License

[Include any licensing details here.]

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Sep 11, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

safe_llm_parser-0.1.0.tar.gz (7.0 kB view details)

Uploaded Sep 11, 2024 Source

Built Distribution

safe_llm_parser-0.1.0-py3-none-any.whl (8.1 kB view details)

Uploaded Sep 11, 2024 Python 3

File details

Details for the file safe_llm_parser-0.1.0.tar.gz.

File metadata

Download URL: safe_llm_parser-0.1.0.tar.gz
Upload date: Sep 11, 2024
Size: 7.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.3 CPython/3.8.1 Windows/10

File hashes

Hashes for safe_llm_parser-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`ce87129a1f70e3923abc59a0b1fdd46481de0693ba600f4796a25c54c5cfb1fa`
MD5	`7aca2b18b7fddc6a42f12d75fbc840c5`
BLAKE2b-256	`27ecf97bb820e2562171a14e675575dc92e2ffd67ec9f3396b8e9f2023ded704`

See more details on using hashes here.

File details

Details for the file safe_llm_parser-0.1.0-py3-none-any.whl.

File metadata

Download URL: safe_llm_parser-0.1.0-py3-none-any.whl
Upload date: Sep 11, 2024
Size: 8.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.3 CPython/3.8.1 Windows/10

File hashes

Hashes for safe_llm_parser-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f8b5b6905ecf7a4e532674a49fd757e1a0fdac360982af3a5b2e9edbaaa7c252`
MD5	`9f5025ddf92d19a61347b8d8e97f7b1e`
BLAKE2b-256	`e474db9632fc3d6e7df82a87b9892bf9ccc57ae42b7d3ecf883da174445c1307`