Skip to main content

purifyllm

Project description

PurifyLLM

CLI and pre-commit hook to normalize “smart” punctuation and invisible Unicode produced by LLMs.

What it does

Replaces common UTF-8 special characters with safe equivalents:

  • smart quotes “ ” ‘ ’ -> " and '
  • dashes – — − -> -
  • ellipsis … -> ...
  • non-breaking spaces and thin spaces -> regular space
  • zero-width and BOM characters -> removed

You can add your own mappings in the hook config.

Install (as a hook consumer)

Add this repo to your .pre-commit-config.yaml:

repos:
- repo: https://github.com/wdroz/PurifyLLM
  rev: v0.1.1
  hooks:
  - id: purify-llm
    # optional: ignore folders/files via glob and add extra replacements
    exclude: '(^|/)(LICENSES|licenses)/'
  args:
  # add custom mappings
  - --map
  - "\u00AB=\""   # « to "
  - --map
  - "\u00BB=\""   # » to "

Then install hooks:

pre-commit install

Run on all files at any time:

pre-commit run --all-files

CLI usage

purifyllm [--no-defaults] [--map KEY=VALUE ...] [--ignore-files GLOB ...] [FILES ...]

Examples:

purifyllm README.md
purifyllm --map "\u00B7=-" file.txt
purifyllm --no-defaults --map "…=..." --map "—=-" src/
purifyllm --ignore-files '**/LICENSES/**' --ignore-files 'docs/vendor/**' $(git ls-files)

Exit codes:

  • 0: no changes needed
  • 1: files were modified or an error occurred

Ignoring files and folders (glob)

Use one or more --ignore-files flags to skip files by glob pattern. Matching is against the full path using forward slashes.

Examples:

  • --ignore-files '**/LICENSES/**' ignore any files under a LICENSES directory anywhere.
  • --ignore-files 'docs/vendor/**' ignore files under docs/vendor.
  • --ignore-files '*.md' ignore markdown files.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

purifyllm-0.1.2.tar.gz (7.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

purifyllm-0.1.2-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file purifyllm-0.1.2.tar.gz.

File metadata

  • Download URL: purifyllm-0.1.2.tar.gz
  • Upload date:
  • Size: 7.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for purifyllm-0.1.2.tar.gz
Algorithm Hash digest
SHA256 e577ec978bf65bc4b994b2d00d8c7a5c652956e1b776fd595bd4c7298124013f
MD5 46a1cb50008d52de6fa6727c5281c552
BLAKE2b-256 a95a7c246b159a1402c44734fced689cfe497e0b8b2e9c6f03db1c651240864c

See more details on using hashes here.

Provenance

The following attestation bundles were made for purifyllm-0.1.2.tar.gz:

Publisher: publish.yml on wdroz/PurifyLLM

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file purifyllm-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: purifyllm-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 6.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for purifyllm-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 25b3fa826e2b258373df124146e43111995ddd5882da208813439b7ab899913d
MD5 6a6ab5223172cf243b3edacdfce49079
BLAKE2b-256 f48794b6889bbe927f599936d8ea05c767cf7e7ca299eb912409cb06decebb71

See more details on using hashes here.

Provenance

The following attestation bundles were made for purifyllm-0.1.2-py3-none-any.whl:

Publisher: publish.yml on wdroz/PurifyLLM

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page