Skip to main content

Utilities for cleaning and normalizing raw LLM output

Project description

llmclean

A zero-dependency Python library for cleaning and normalizing raw LLM output.

LLMs are inconsistent: they wrap JSON in markdown fences, add prose around code, repeat themselves, and produce subtly broken JSON. llmclean handles all of that with three focused utilities.


Install

pip install llmclean

Quick start

from llmclean import strip_fences, enforce_json, trim_repetition

# Remove ```json ... ``` wrappers
strip_fences('```json\n{"name": "Alice"}\n```')
# → '{"name": "Alice"}'

# Extract valid JSON from messy output
enforce_json('Here you go: {"ok": True, "items": [1,2,3,]}')
# → '{\n  "ok": true,\n  "items": [1, 2, 3]\n}'

# Remove repeated sentences/paragraphs at the end
trim_repetition("The answer is 42. This is final. This is final.")
# → 'The answer is 42. This is final.'

For full examples and edge cases see USAGE.md.


Functions

Function What it fixes
strip_fences(text) Removes ```lang / ``` / ~~~ code fences
enforce_json(text) Extracts valid JSON from fences, prose, trailing commas, Python literals, unquoted keys, unclosed brackets
trim_repetition(text) Removes repeated sentences, near-duplicates, and repeated paragraphs from the tail

Design principles

  • Zero dependencies — pure Python standard library
  • Never throws — every function returns the original input if cleaning fails
  • Non-destructive — unchanged input when nothing needs cleaning
  • Composable — chain freely
# Full pipeline
data = enforce_json(trim_repetition(strip_fences(raw_output)))

Running tests

# With pytest
pip install "llmclean[dev]"
pytest -v

# Without pytest
python run_tests.py

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmclean-0.1.0.tar.gz (14.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmclean-0.1.0-py3-none-any.whl (11.4 kB view details)

Uploaded Python 3

File details

Details for the file llmclean-0.1.0.tar.gz.

File metadata

  • Download URL: llmclean-0.1.0.tar.gz
  • Upload date:
  • Size: 14.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for llmclean-0.1.0.tar.gz
Algorithm Hash digest
SHA256 109b64fa58f7caddcd926d8c2d80b40c4763594651a52cdce93a0cfa010a9f6d
MD5 874bd1510793ebf1951bfb94f0b2237a
BLAKE2b-256 c1d4a77a1eb6d23e0147266e927228a393b8eace407ea7a3226b81940e1f148e

See more details on using hashes here.

File details

Details for the file llmclean-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: llmclean-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for llmclean-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9d892a4bbe4db7cab9d25b843c8cec703e006c578d663c53f47e32faccafea10
MD5 78bda0709fd996a86f6c74ff371debce
BLAKE2b-256 b1450515ab6583b2a4fe742c44bbfd4936f7bfb922957ba1bf5c9a6cbfa2e3ab

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page