purifyllm
Project description
PurifyLLM
CLI and pre-commit hook to normalize “smart” punctuation and invisible Unicode produced by LLMs.
What it does
Replaces common UTF-8 special characters with safe equivalents:
- smart quotes “ ” ‘ ’ -> " and '
- dashes – — − -> -
- ellipsis … -> ...
- non-breaking spaces and thin spaces -> regular space
- zero-width and BOM characters -> removed
You can add your own mappings in the hook config.
Install (as a hook consumer)
Add this repo to your .pre-commit-config.yaml:
repos:
- repo: https://github.com/wdroz/PurifyLLM
rev: v0.1.1
hooks:
- id: purify-llm
# optional: ignore folders/files via glob and add extra replacements
exclude: '(^|/)(LICENSES|licenses)/'
args:
# add custom mappings
- --map
- "\u00AB=\"" # « to "
- --map
- "\u00BB=\"" # » to "
Then install hooks:
pre-commit install
Run on all files at any time:
pre-commit run --all-files
CLI usage
purifyllm [--no-defaults] [--map KEY=VALUE ...] [--ignore-files GLOB ...] [FILES ...]
Examples:
purifyllm README.md
purifyllm --map "\u00B7=-" file.txt
purifyllm --no-defaults --map "…=..." --map "—=-" src/
purifyllm --ignore-files '**/LICENSES/**' --ignore-files 'docs/vendor/**' $(git ls-files)
Exit codes:
- 0: no changes needed
- 1: files were modified or an error occurred
Ignoring files and folders (glob)
Use one or more --ignore-files flags to skip files by glob pattern. Matching is against the full path using forward slashes.
Examples:
--ignore-files '**/LICENSES/**'ignore any files under aLICENSESdirectory anywhere.--ignore-files 'docs/vendor/**'ignore files underdocs/vendor.--ignore-files '*.md'ignore markdown files.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file purifyllm-0.1.2.tar.gz.
File metadata
- Download URL: purifyllm-0.1.2.tar.gz
- Upload date:
- Size: 7.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e577ec978bf65bc4b994b2d00d8c7a5c652956e1b776fd595bd4c7298124013f
|
|
| MD5 |
46a1cb50008d52de6fa6727c5281c552
|
|
| BLAKE2b-256 |
a95a7c246b159a1402c44734fced689cfe497e0b8b2e9c6f03db1c651240864c
|
Provenance
The following attestation bundles were made for purifyllm-0.1.2.tar.gz:
Publisher:
publish.yml on wdroz/PurifyLLM
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
purifyllm-0.1.2.tar.gz -
Subject digest:
e577ec978bf65bc4b994b2d00d8c7a5c652956e1b776fd595bd4c7298124013f - Sigstore transparency entry: 952475198
- Sigstore integration time:
-
Permalink:
wdroz/PurifyLLM@b52d94387c2a61a9403c183a2cc06d97cab783c6 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/wdroz
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b52d94387c2a61a9403c183a2cc06d97cab783c6 -
Trigger Event:
push
-
Statement type:
File details
Details for the file purifyllm-0.1.2-py3-none-any.whl.
File metadata
- Download URL: purifyllm-0.1.2-py3-none-any.whl
- Upload date:
- Size: 6.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
25b3fa826e2b258373df124146e43111995ddd5882da208813439b7ab899913d
|
|
| MD5 |
6a6ab5223172cf243b3edacdfce49079
|
|
| BLAKE2b-256 |
f48794b6889bbe927f599936d8ea05c767cf7e7ca299eb912409cb06decebb71
|
Provenance
The following attestation bundles were made for purifyllm-0.1.2-py3-none-any.whl:
Publisher:
publish.yml on wdroz/PurifyLLM
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
purifyllm-0.1.2-py3-none-any.whl -
Subject digest:
25b3fa826e2b258373df124146e43111995ddd5882da208813439b7ab899913d - Sigstore transparency entry: 952475200
- Sigstore integration time:
-
Permalink:
wdroz/PurifyLLM@b52d94387c2a61a9403c183a2cc06d97cab783c6 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/wdroz
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b52d94387c2a61a9403c183a2cc06d97cab783c6 -
Trigger Event:
push
-
Statement type: