Salvage structured data from LLM responses that didn't follow instructions.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

bme10

These details have not been verified by PyPI

Project description

llm-salvage

Salvage structured data from LLM responses that didn't follow instructions.

pip install llm-salvage

What this is for

You ask a local model for structured output. It mostly does what you said, but:

It wrapped the JSON in markdown code fences when you said not to.
It used a synonym for one of your field names - sentiment instead of verdict.
It returned Bullish when your schema expected BULLISH.
It misspelled a tag name - [VERDCT] instead of [VERDICT].
It returned trailing commas, smart quotes, or nested objects where you wanted strings.
It wrote a thoughtful paragraph before the structured output you asked for.

You can prompt around these problems, retry with stricter instructions, or switch to a model with better tool-calling support. Or you can accept that local models do this sometimes and parse what you got.

llm-salvage is the third option. It applies deterministic corrections, extracts data in tagged or JSON or assignment formats, validates against a schema, and returns a result you can inspect - with a record of every fix that was applied along the way.

What this is not

It does not call any LLM. It does not retry. It does not depend on Pydantic, PyYAML, or any other library by default. It does not know what model produced the text it's parsing. It is not a replacement for Instructor or PydanticAI - if you have a frontier model with reliable tool-calling, those libraries are simpler and more powerful. This library is for when tool-calling isn't available or isn't reliable, and you need to make sense of raw text output.

Quick start

from llm_salvage import ResponseParser, Schema, Field

schema = Schema(fields={
    "sentiment":  Field(choices=["positive", "negative", "neutral"]),
    "confidence": Field(choices=["high", "medium", "low"]),
    "summary":    Field(min_length=20),
})

response = '''
```json
{
  "sentiment": "Positive",
  "confidence": "HIGH",
  "summary": "The product launch exceeded expectations across all key metrics.",
}
```
'''

result = ResponseParser(schema).parse(response)

if result.ok:
    print(result.data["sentiment"])    # "POSITIVE"
    print(result.corrections)          # ['stripped_code_fences', 'removed_trailing_commas', ...]
else:
    for error in result.errors:
        print(error)

The parser stripped the code fences, repaired the trailing comma, normalized "Positive" to match the schema's choices, and recorded each fix as a correction code. The response text never raised an exception - the parser returns a ParseResult you inspect.

How it works

Four passes, in order:

1. Structural corrections. Code fence removal, BOM stripping, line ending normalization, tag-name typo correction (when a typo map is configured), auto-closing of unclosed tags whose names match schema fields.

2. Extraction. The parser detects whether the response uses tagged, JSON, or assignment format and tries them in order. JSON keys are matched against schema field names directly, with optional aliases for legacy or domain-specific naming.

3. Validation. Field types are checked, choices are normalized case-insensitively, probability dicts are summed, week-range strings are parsed into structured form. Validation never modifies data destructively

if a value can't be normalized, it's reported as an error.

4. Telemetry (optional). Each parse can write a JSONL event recording which corrections were applied, what errors remained, and which model the response came from. Over time this builds a corpus you can query to see which models need which corrections.

Schema definition

Schemas can be defined in code:

from llm_salvage import Schema, Field, FieldType, Formats

schema = Schema(
    fields={
        "topic":     Field(choices=["billing", "technical", "general"]),
        "priority":  Field(choices=["urgent", "normal", "low"]),
        "summary":   Field(min_length=10, max_length=500),
        "needs_human_review": Field(type=FieldType.STRING, required=False, default="no"),
    },
    formats=[Formats.TAGGED, Formats.JSON],
)

Or loaded from a file:

from llm_salvage import Schema

schema = Schema.from_file("schemas/support_ticket.yaml")

Where support_ticket.yaml looks like:

fields:
  topic:
    choices: [billing, technical, general]
  priority:
    choices: [urgent, normal, low]
  summary:
    min_length: 10
    max_length: 500
  needs_human_review:
    type: string
    required: false
    default: "no"

formats: [tagged, json]

YAML, JSON, and TOML are all supported. YAML requires pip install 'llm-salvage[yaml]'.

Field types

Type	Use for
`STRING`	Free-form text with optional `min_length`/`max_length`
`CHOICE`	Enum of allowed values, case-insensitive
`INTEGER`	Whole numbers
`FLOAT`	Decimal numbers
`PROBABILITY`	Dict of label→int that should sum to ~100
`WEEK_RANGE`	Strings like `"2-4 weeks"` parsed to `{min, max}`

A field's type is inferred from its arguments - Field(choices=[...]) is a CHOICE field, Field(min_length=20) is a STRING field. Specify type=FieldType.X explicitly when the inference would be wrong.

Adapters

Optional integrations that activate when their dependency is installed.

Pydantic - convert between Schema and Pydantic models:

# pip install 'llm-salvage[pydantic]'
from llm_salvage.adapters.pydantic import schema_from_pydantic, to_pydantic
from pydantic import BaseModel

class Ticket(BaseModel):
    topic: str
    priority: str
    summary: str

schema = schema_from_pydantic(Ticket)
result = ResponseParser(schema).parse(response)
ticket = to_pydantic(result, Ticket)

json-repair - use the json-repair library for more robust JSON repair:

# pip install 'llm-salvage[repair]'
# No code change needed - the parser uses json-repair automatically when installed.

Telemetry

When you pass a log_path, the parser writes one JSONL event per parse attempt, recording corrections applied, errors encountered, and the model name. This is opt-in:

parser = ResponseParser(
    schema,
    log_path="parses.jsonl",
    model="llama3.2:3b",
)

for response in responses:
    parser.parse(response, task_id=response.task_id)

After a few hundred parses, you can ask the corpus what each model needs:

from llm_salvage import model_profile

profile = model_profile("parses.jsonl", "llama3.2:3b")
# {
#   "model": "llama3.2:3b",
#   "events": 847,
#   "valid_pct": 89.4,
#   "corrections": {
#     "stripped_code_fences": 612,
#     "case_normalized_BULLISH": 243,
#     ...
#   },
#   "top_correction": "stripped_code_fences"
# }

This is the most useful piece of the library for ongoing operations. It turns the parser into a feedback loop: you see which corrections each model consistently needs, which suggests which prompt changes would have the biggest effect.

Set log_corrections_only=True if you only want to record events where corrections were actually applied - useful when you're parsing high volume and don't need a record of every clean parse.

Comparison with other libraries

Library	Use when
Instructor	You're using a model with reliable tool-calling.
PydanticAI	You're building agents and want a full framework.
json-repair	You only need JSON repair, no schema or tagged formats.
`llm-salvage` (this)	Local models, mixed formats, post-hoc parsing.

These compose. You can use Instructor for your frontier-model path and llm-salvage for your local-model fallback in the same codebase.

Examples

The examples/ directory has end-to-end examples covering several common domains:

examples/sentiment_analysis.py - review classification
examples/support_triage.py - customer ticket routing
examples/content_moderation.py - flag and category extraction
examples/product_extraction.py - pulling structured product data from descriptions
examples/code_review.py - extracting findings from LLM code review
examples/medical_triage.py - symptom severity classification

Documentation

docs/comparison.md - when to reach for which library
docs/schema-files.md - YAML/JSON/TOML schema syntax
docs/telemetry.md - interpreting JSONL telemetry
docs/adapters.md - Pydantic and json-repair adapters
docs/limitations.md - known v0.1.0 limitations and workarounds

Status

v0.1.0 is alpha. The API may change before 1.0. If you find a parsing case that should work but doesn't, opening an issue with the response text is the most useful contribution - telemetry corpora from real workloads beat invented test cases.

License

MIT - see LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

bme10

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.3

Apr 29, 2026

This version

0.1.2

Apr 28, 2026

0.1.1

Apr 27, 2026

0.1.0

Apr 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_salvage-0.1.2.tar.gz (52.5 kB view details)

Uploaded Apr 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_salvage-0.1.2-py3-none-any.whl (33.0 kB view details)

Uploaded Apr 28, 2026 Python 3

File details

Details for the file llm_salvage-0.1.2.tar.gz.

File metadata

Download URL: llm_salvage-0.1.2.tar.gz
Upload date: Apr 28, 2026
Size: 52.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llm_salvage-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`4e7fabdf920cd41850d267e0d091454ea5e1b6bfaa11b57e27df7b6b713066bc`
MD5	`1c44f3f4a22122d6ead0ce5f15807c20`
BLAKE2b-256	`cf96e4db4c937d7b409dbccf8ce791aa7b402642def24f51978f9e842acc6dd5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_salvage-0.1.2.tar.gz:

Publisher: publish.yml on bme10/llm-salvage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llm_salvage-0.1.2.tar.gz
- Subject digest: 4e7fabdf920cd41850d267e0d091454ea5e1b6bfaa11b57e27df7b6b713066bc
- Sigstore transparency entry: 1396623137
- Sigstore integration time: Apr 28, 2026
Source repository:
- Permalink: bme10/llm-salvage@081e68d5e882503b0bb8301d40e26df3bc792b77
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/bme10
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@081e68d5e882503b0bb8301d40e26df3bc792b77
- Trigger Event: release

File details

Details for the file llm_salvage-0.1.2-py3-none-any.whl.

File metadata

Download URL: llm_salvage-0.1.2-py3-none-any.whl
Upload date: Apr 28, 2026
Size: 33.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llm_salvage-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4aa87270141808d868dde7513dd6662f487e2ca3e9e00b993b8996335f0f97a0`
MD5	`fd64f4fc88a16a47b554930c976c9929`
BLAKE2b-256	`9932506a771f528bb77b7745364dfb97a0223b39e3ac9fe856b7771d243c880b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_salvage-0.1.2-py3-none-any.whl:

Publisher: publish.yml on bme10/llm-salvage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llm_salvage-0.1.2-py3-none-any.whl
- Subject digest: 4aa87270141808d868dde7513dd6662f487e2ca3e9e00b993b8996335f0f97a0
- Sigstore transparency entry: 1396623143
- Sigstore integration time: Apr 28, 2026
Source repository:
- Permalink: bme10/llm-salvage@081e68d5e882503b0bb8301d40e26df3bc792b77
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/bme10
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@081e68d5e882503b0bb8301d40e26df3bc792b77
- Trigger Event: release

llm-salvage 0.1.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

llm-salvage

What this is for

What this is not

Quick start

How it works

Schema definition

Field types

Adapters

Telemetry

Comparison with other libraries

Examples

Documentation

Status

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance