Reliable way to parse LLM outputs
Project description
SafeXMLParser - README
Overview
SafeXMLParser
is a Python class designed to provide a safer and fault-tolerant way to parse XML strings. It leverages Large Language Models (LLMs) to correct malformed XML in case the initial parsing fails. This class supports multiple attempts for parsing and logs every parsing attempt, including successful parses, errors, and any LLM-based corrections.
Features
- Multiple Attempts: Provides the option to specify multiple parsing attempts to handle malformed XML.
- LLM-based Correction: Uses a specified LLM model to attempt XML correction if parsing fails.
- Logging: Records all attempts, including input, output, errors, and LLM correction details.
- Flexible Configuration: Customizable LLM model and number of attempts for robust XML parsing.
Installation
- Clone this repository or download the code.
- Install the required dependencies (e.g.,
beautifulsoup4
or any LLM model dependencies):pip install beautifulsoup4
Example Usage
1. Importing the Class
First, import the SafeXMLParser
class and any other necessary components:
from safe_xml_parser import SafeXMLParser # Example import path
2. Basic Usage (Single Parsing Attempt)
Here is an example of how to use SafeXMLParser
for a basic XML parsing operation without a fallback model:
# Example XML string
xml_data = "<root><child>data</child></root>"
# Initialize the parser
parser = SafeXMLParser()
# Attempt to parse the XML data (one attempt, no LLM correction)
try:
parsed_data = parser.safe_parse(xml_data)
print("Parsed Data:", parsed_data)
except Exception as e:
print(f"Parsing failed: {e}")
# Output: {'root': {'child': 'data'}}
3. Multiple Attempts with LLM Correction
If you are dealing with malformed XML, you can provide a custom LLM model to correct the data between attempts:
def fallback_correction(text):
# Simple function to simulate fixing the broken XML
return text.replace("<broken>", "<child>").replace("</broken>", "</child>")
# Malformed XML string
malformed_xml = "<root><broken>data</root>"
# Initialize the parser with the fallback correction model
parser = SafeXMLParser(default_llm_model=fallback_correction, default_nb_attempts=2)
# Attempt to parse the malformed XML data
try:
parsed_data = parser.safe_parse(malformed_xml)
print("Parsed Data:", parsed_data)
except Exception as e:
print(f"Parsing failed: {e}")
# Output: {'root': {'child': 'data'}}
4. Accessing Parsing Logs
Logs are available for every parsing attempt, showing the input, output, error messages, and any LLM-based corrections applied:
# Access logs after parsing
logs = parser.logs()
for log in logs:
print(log)
Logs provide insights into each step of the parsing process, including what the LLM model was prompted with and what corrections it made.
5. Dynamic Configuration of LLM and Attempts
You can also configure the LLM model and number of attempts dynamically during parsing:
# Initialize parser without setting defaults
parser = SafeXMLParser()
# Dynamically pass custom LLM model and attempts
try:
parsed_data = parser.safe_parse(
malformed_xml,
nb_attempts=3,
llm_model=fallback_correction
)
print("Parsed Data:", parsed_data)
except Exception as e:
print(f"Parsing failed: {e}")
Method Summary
1. safe_parse(text_to_parse: str, nb_attempts: Optional[int] = None, llm_model: Optional[Callable] = None, correctness_prompt_maker: Callable = create_fix_xml_prompt) -> Dict[str, Union[Dict, str]]
- Description: Attempts to parse the XML string multiple times, with the option of using an LLM model to correct any errors between attempts.
- Args:
text_to_parse
: The XML string to be parsed.nb_attempts
: The number of attempts allowed for parsing (default: 1).llm_model
: The function used to correct XML between attempts (default: None).correctness_prompt_maker
: A function that creates prompts for LLM correction (default:create_fix_xml_prompt
).
- Returns: A dictionary representation of the parsed XML.
- Raises: Raises an
Exception
if all attempts fail.
2. logs(timestamp: bool = True) -> List[Dict[str, str]]
- Description: Returns logs of each parsing attempt, with an option to include/exclude timestamps.
- Args:
timestamp
: Whether to include timestamps in the logs (default: True).
- Returns: A list of dictionaries containing details of each parsing attempt.
Logging Structure
Logs include the following information:
- Input: The XML string that was parsed.
- Output: The resulting parsed output (if successful) or "N/A".
- Error: Any error encountered during parsing, or "N/A" if successful.
- Correctness Prompt: The prompt sent to the LLM for correction (if applicable).
- Correctness Output: The corrected output from the LLM model (if applicable).
Handling Edge Cases
- If parsing fails after all attempts, the parser raises an exception.
- The LLM model can be customized to handle different error types or malformed XML structures.
Conclusion
The SafeXMLParser
class offers a robust and flexible solution for parsing XML data, with built-in fault tolerance through LLM-based correction and detailed logging for easier debugging. This class is ideal for scenarios where XML data may be incomplete or malformed and multiple attempts are needed to ensure successful parsing.
License
[Include any licensing details here.]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file safe_llm_parser-0.1.0.tar.gz
.
File metadata
- Download URL: safe_llm_parser-0.1.0.tar.gz
- Upload date:
- Size: 7.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.8.1 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ce87129a1f70e3923abc59a0b1fdd46481de0693ba600f4796a25c54c5cfb1fa |
|
MD5 | 7aca2b18b7fddc6a42f12d75fbc840c5 |
|
BLAKE2b-256 | 27ecf97bb820e2562171a14e675575dc92e2ffd67ec9f3396b8e9f2023ded704 |
File details
Details for the file safe_llm_parser-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: safe_llm_parser-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.8.1 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f8b5b6905ecf7a4e532674a49fd757e1a0fdac360982af3a5b2e9edbaaa7c252 |
|
MD5 | 9f5025ddf92d19a61347b8d8e97f7b1e |
|
BLAKE2b-256 | e474db9632fc3d6e7df82a87b9892bf9ccc57ae42b7d3ecf883da174445c1307 |