Skip to main content

purifyllm

Project description

PurifyLLM

CLI and pre-commit hook to normalize “smart” punctuation and invisible Unicode produced by LLMs.

What it does

Replaces common UTF-8 special characters with safe equivalents:

  • smart quotes “ ” ‘ ’ -> " and '
  • dashes – — − -> -
  • ellipsis … -> ...
  • non-breaking spaces and thin spaces -> regular space
  • zero-width and BOM characters -> removed

You can add your own mappings in the hook config.

Install (as a hook consumer)

Add this repo to your .pre-commit-config.yaml:

repos:
- repo: https://github.com/wdroz/PurifyLLM
  rev: v0.1.1
  hooks:
  - id: purify-llm
    # optional: ignore folders/files via glob and add extra replacements
    exclude: '(^|/)(LICENSES|licenses)/'
  args:
  # add custom mappings
  - --map
  - "\u00AB=\""   # « to "
  - --map
  - "\u00BB=\""   # » to "

Then install hooks:

pre-commit install

Run on all files at any time:

pre-commit run --all-files

CLI usage

purifyllm [--no-defaults] [--map KEY=VALUE ...] [--ignore-files GLOB ...] [FILES ...]

Examples:

purifyllm README.md
purifyllm --map "\u00B7=-" file.txt
purifyllm --no-defaults --map "…=..." --map "—=-" src/
purifyllm --ignore-files '**/LICENSES/**' --ignore-files 'docs/vendor/**' $(git ls-files)

Exit codes:

  • 0: no changes needed
  • 1: files were modified or an error occurred

Ignoring files and folders (glob)

Use one or more --ignore-files flags to skip files by glob pattern. Matching is against the full path using forward slashes.

Examples:

  • --ignore-files '**/LICENSES/**' ignore any files under a LICENSES directory anywhere.
  • --ignore-files 'docs/vendor/**' ignore files under docs/vendor.
  • --ignore-files '*.md' ignore markdown files.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

purifyllm-0.1.1.tar.gz (6.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

purifyllm-0.1.1-py3-none-any.whl (6.3 kB view details)

Uploaded Python 3

File details

Details for the file purifyllm-0.1.1.tar.gz.

File metadata

  • Download URL: purifyllm-0.1.1.tar.gz
  • Upload date:
  • Size: 6.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for purifyllm-0.1.1.tar.gz
Algorithm Hash digest
SHA256 eea9156ff205af2689d6557dd3ae3f1d0bbdb4cab28152fb3a2d7a7fd55989ee
MD5 9813bc8275aa3bbe1042e3acfaa030a2
BLAKE2b-256 943268b7adafef87b072ad7665d4529e30de183d332ee113e597967e85f28c2d

See more details on using hashes here.

Provenance

The following attestation bundles were made for purifyllm-0.1.1.tar.gz:

Publisher: publish.yml on wdroz/PurifyLLM

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file purifyllm-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: purifyllm-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 6.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for purifyllm-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 07fc0d3d08b6aafef3824b250106ad48bfd7771f6ae7e6b38c1c486b3fb251c1
MD5 163532734182ab82058988ff976e409a
BLAKE2b-256 4a7b0b5e977a8656b329215c58547a700e606bb8f7e08faf5993522bec47d448

See more details on using hashes here.

Provenance

The following attestation bundles were made for purifyllm-0.1.1-py3-none-any.whl:

Publisher: publish.yml on wdroz/PurifyLLM

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page