Skip to main content

purifyllm

Project description

PurifyLLM

CLI and pre-commit hook to normalize “smart” punctuation and invisible Unicode produced by LLMs.

What it does

Replaces common UTF-8 special characters with safe equivalents:

  • smart quotes “ ” ‘ ’ -> " and '
  • dashes – — − -> -
  • ellipsis … -> ...
  • non-breaking spaces and thin spaces -> regular space
  • zero-width and BOM characters -> removed

You can add your own mappings in the hook config.

Install (as a hook consumer)

Add this repo to your .pre-commit-config.yaml:

repos:
- repo: https://github.com/wdroz/PurifyLLM
  rev: v0.0.1
  hooks:
  - id: purify-llm
  # optional: ignore folders/files via glob and add extra replacements
  args:
  # ignore everything under any LICENSES/ dir
  - --ignore-files
  - '**/LICENSES/**'
  # ignore a specific vendor subtree
  - --ignore-files
  - 'docs/vendor/**'
  # add custom mappings
  - --map
  - "\u00AB=\""   # « to "
  - --map
  - "\u00BB=\""   # » to "

Then install hooks:

pre-commit install

Run on all files at any time:

pre-commit run --all-files

CLI usage

purifyllm [--no-defaults] [--map KEY=VALUE ...] [--ignore-files GLOB ...] [FILES ...]

Examples:

purifyllm README.md
purifyllm --map "\u00B7=-" file.txt
purifyllm --no-defaults --map "…=..." --map "—=-" src/
purifyllm --ignore-files '**/LICENSES/**' --ignore-files 'docs/vendor/**' $(git ls-files)

Exit codes:

  • 0: no changes needed
  • 1: files were modified or an error occurred

Ignoring files and folders (glob)

Use one or more --ignore-files flags to skip files by glob pattern. Matching is against the full path using forward slashes.

Examples:

  • --ignore-files '**/LICENSES/**' ignore any files under a LICENSES directory anywhere.
  • --ignore-files 'docs/vendor/**' ignore files under docs/vendor.
  • --ignore-files '*.md' ignore markdown files.

Tip: when used with pre-commit, pass along the filenames from pre-commit and filter undesired directories using these patterns.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

purifyllm-0.1.0.tar.gz (6.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

purifyllm-0.1.0-py3-none-any.whl (6.3 kB view details)

Uploaded Python 3

File details

Details for the file purifyllm-0.1.0.tar.gz.

File metadata

  • Download URL: purifyllm-0.1.0.tar.gz
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for purifyllm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1c65c1b84a5a50a0f3c7179f262bfb89dc36a01dd81ea9f0da13fe3975206ae2
MD5 f41d94b9f09ffea9dbe26e226d8f355c
BLAKE2b-256 33e6995f8ab48fd338e47a8c472a962717f2afc439c6dbb52035c873d52b4559

See more details on using hashes here.

Provenance

The following attestation bundles were made for purifyllm-0.1.0.tar.gz:

Publisher: publish.yml on wdroz/PurifyLLM

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file purifyllm-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: purifyllm-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for purifyllm-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2780eeffe1a0789d609802d699a7cdc4aec3eaf3c68abc2a2ca22e671db56aeb
MD5 bad014226fbaaf1ece37f3cfcf6cecac
BLAKE2b-256 b53d3bf887456aeb8f2e9098f83039feed2156a1d31300d5d5eab03b113ff734

See more details on using hashes here.

Provenance

The following attestation bundles were made for purifyllm-0.1.0-py3-none-any.whl:

Publisher: publish.yml on wdroz/PurifyLLM

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page