Extract and parse JSON from unstructured text outputs from LLMs

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

LLM Output Parser

A robust utility for extracting and parsing structured data (JSON and XML) from unstructured text outputs generated by Large Language Models (LLMs).

Features

Extracts JSON and XML from plain text, code blocks, and mixed content
Handles various JSON formats: objects, arrays, and nested structures
Converts XML to JSON-compatible dictionary format
Advanced extraction strategies for multiple JSON/XML objects in text
Provides robust error handling and recovery strategies
Works with markdown code blocks (json ... and xml ... )
Intelligently selects the most comprehensive structure when multiple are found

Installation

Install from PyPI:

pip install llm-output-parser

Or install from source:

git clone https://github.com/KameniAlexNea/llm-output-parser.git
cd llm-output-parser
pip install -e .

Usage

JSON Parsing

from llm_output_parser import parse_json

# Parse JSON from an LLM response
llm_response = """
Here's the data you requested:


{
    "name": "John Doe",
    "age": 30,
    "skills": ["Python", "Machine Learning", "NLP"]
}


Let me know if you need anything else!
"""

data = parse_json(llm_response)
print(data["name"])  # John Doe
print(data["skills"])  # ['Python', 'Machine Learning', 'NLP']

XML Parsing

from llm_output_parser import parse_xml

# Parse XML from an LLM response and convert to JSON
llm_response = """
Here's the user data in XML format:

```xml
<user id="123">
    <name>Jane Smith</name>
    <email>jane@example.com</email>
    <roles>
        <role>admin</role>
        <role>editor</role>
    </roles>
</user>

Let me know if you need any other information. """

data = parse_xml(llm_response) print(data["@id"]) # 123 print(data["name"]) # Jane Smith print(data["roles"]["role"]) # ['admin', 'editor']


### Handling Complex Cases

The library can handle various complex scenarios:

#### JSON Within Text

```python
text = 'The user profile is: {"name": "John", "email": "john@example.com"}'
data = parse_json(text)  # -> {"name": "John", "email": "john@example.com"}

XML Within Text

text = 'The configuration is: <config><server>localhost</server><port>8080</port></config>'
data = parse_xml(text)  # -> {"server": "localhost", "port": "8080"}

Multiple JSON/XML Objects

When multiple valid objects are present, the parser returns the most comprehensive one:

# For JSON
text = '''
Small object: {"id": 123}

Larger object:
{
    "user": {
        "id": 123,
        "name": "John",
        "email": "john@example.com",
        "preferences": {
            "theme": "dark",
            "notifications": true
        }
    }
}
'''
data = parse_json(text)  # Returns the larger, more complex object

# For XML
text = '''
Simple: <item>value</item>

Complex:
<product category="electronics">
    <name>Smartphone</name>
    <price currency="USD">999.99</price>
    <features>
        <feature>5G</feature>
        <feature>High-res camera</feature>
    </features>
</product>
'''
data = parse_xml(text)  # Returns the more complex XML converted to JSON

XML to JSON Conversion Details

When parsing XML, the library converts it to a JSON-compatible dictionary with the following conventions:

XML attributes are prefixed with @ (e.g., <item id="123"> becomes {"@id": "123"})
Text content of elements with attributes or children is stored under #text key
Simple elements with only text become key-value pairs
Repeated elements are automatically converted to arrays

Example:

xml_str = '''
<library>
    <book category="fiction">
        <title>The Great Gatsby</title>
        <author>F. Scott Fitzgerald</author>
    </book>
    <book category="non-fiction">
        <title>Sapiens</title>
        <author>Yuval Noah Harari</author>
    </book>
</library>
'''
data = parse_xml(xml_str)
# Results in:
# {
#     "book": [
#         {
#             "@category": "fiction",
#             "title": "The Great Gatsby",
#             "author": "F. Scott Fitzgerald"
#         },
#         {
#             "@category": "non-fiction",
#             "title": "Sapiens",
#             "author": "Yuval Noah Harari"
#         }
#     ]
# }

Error Handling

If no valid structure can be found, a ValueError is raised:

try:
    data = parse_json("No JSON here!")
except ValueError as e:
    print(f"Error: {e}")  # "Error: Failed to parse JSON from the input string."

try:
    data = parse_xml("No XML here!")
except ValueError as e:
    print(f"Error: {e}")  # "Error: Failed to parse XML from the input string."

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

alexneakameni

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.3.0

Jun 19, 2025

0.2.0

Mar 26, 2025

0.1.0

Mar 25, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_output_parser-0.3.0.tar.gz (14.5 kB view details)

Uploaded Jun 19, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_output_parser-0.3.0-py3-none-any.whl (14.4 kB view details)

Uploaded Jun 19, 2025 Python 3

File details

Details for the file llm_output_parser-0.3.0.tar.gz.

File metadata

Download URL: llm_output_parser-0.3.0.tar.gz
Upload date: Jun 19, 2025
Size: 14.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for llm_output_parser-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`2bf5c20b0da6460d4c7860c8a48d47d527bafd07f9af4f4fb9b1ff47d3c70c1f`
MD5	`1d6d2923c033fd1a640fe750d807ada8`
BLAKE2b-256	`9efd3517db603e1fc124ce4a59cab568a6433cfba3733cafd9070bdf3334b32e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_output_parser-0.3.0.tar.gz:

Publisher: python-package.yml on KameniAlexNea/llm-output-parser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llm_output_parser-0.3.0.tar.gz
- Subject digest: 2bf5c20b0da6460d4c7860c8a48d47d527bafd07f9af4f4fb9b1ff47d3c70c1f
- Sigstore transparency entry: 244262019
- Sigstore integration time: Jun 19, 2025
Source repository:
- Permalink: KameniAlexNea/llm-output-parser@440a1461ce381ada22ab01070b9ddaf38b422aa8
- Branch / Tag: refs/tags/llm-output-parser-v0.3.0
- Owner: https://github.com/KameniAlexNea
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-package.yml@440a1461ce381ada22ab01070b9ddaf38b422aa8
- Trigger Event: release

File details

Details for the file llm_output_parser-0.3.0-py3-none-any.whl.

File metadata

Download URL: llm_output_parser-0.3.0-py3-none-any.whl
Upload date: Jun 19, 2025
Size: 14.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for llm_output_parser-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b9d7c9cd469205aa698efb60fffad591f943e806858f9f77c60139295ad96ff9`
MD5	`682e9a2e74339eafc26da6c27af12cb6`
BLAKE2b-256	`c26c888e9db804503ef31567b7b20bbe19263ec85c67c941a7beda2ac2797ae4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_output_parser-0.3.0-py3-none-any.whl:

Publisher: python-package.yml on KameniAlexNea/llm-output-parser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llm_output_parser-0.3.0-py3-none-any.whl
- Subject digest: b9d7c9cd469205aa698efb60fffad591f943e806858f9f77c60139295ad96ff9
- Sigstore transparency entry: 244262025
- Sigstore integration time: Jun 19, 2025
Source repository:
- Permalink: KameniAlexNea/llm-output-parser@440a1461ce381ada22ab01070b9ddaf38b422aa8
- Branch / Tag: refs/tags/llm-output-parser-v0.3.0
- Owner: https://github.com/KameniAlexNea
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-package.yml@440a1461ce381ada22ab01070b9ddaf38b422aa8
- Trigger Event: release

llm-output-parser 0.3.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

LLM Output Parser

Features

Installation

Usage

JSON Parsing

XML Parsing

XML Within Text

Multiple JSON/XML Objects

XML to JSON Conversion Details

Example:

Error Handling

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance