pre-commit fixers and linters for handling text files
Project description
texthooks
A collection of pre-commit
hooks for handling text files.
In particular, hooks for handling unicode characters which may be undesirable in a repository.
Usage with pre-commit
To use with pre-commit
, include this repo and the desired hooks in
.pre-commit-config.yaml
:
- repo: https://github.com/sirosen/texthooks
rev: 0.1.0
hooks:
- id: fix-smartquotes
- id: fix-ligatures
Standalone Usage
Each hook is usable as a CLI script. Simply
pip install texthooks
and then invoke, e.g.
fix-smartquotes FILENAME
Supported Hooks
fix-smartquotes
This fixes copy-paste from some applications which replace double-quotes with curly quotes. It does not convert corner brackets, braile quotation marks, or angle quotation marks. Those characters are not typically the result of copy-paste errors, so they are allowed.
Low quotation marks vary in usage and meaning by language, and some languages use quotation marks which are facing "outwards" (opposite facing from english). For the most part, these and exotic characters (double-prime quotes) are ignored.
In files with the offending marks, they are replaced and the run is marked as failed.
Overriding Quotation Characters
Two options are available for specifying exactly which characters will be replaced. For ease of use, they are specified as hex-encoded unicode codepoints.
Suppose you wanted to avoid replacing the "Heavy single comma quotation
mark ornament" (275C
) and the "Heavy single turned comma quotation mark
ornament" (275B
) characters. You could override the single quote codepoints
as follows:
- repo: https://github.com/sirosen/texthooks
rev: 0.1.0
hooks:
- id: fix-smartquotes
# replace default single quote chars with this set:
# apostrophe, fullwidth apostrophe, left single quote, single high
# reversed-9 quote, right single quote
args: ["--single-quote-codepoints", "0027,FF07,2018,201B,2019"]
fix-ligatures
Automatically find and replace ligature characters with their ascii equivalents.
This replaces liguatures which may be created by programs like LaTeX for
presentation with their strictly-equivalent ASCII counterparts. For example,
fi
and ff
may be ligature-ized.
This hook converts these back into ASCII so that tools like grep
will behave
as expected.
forbid-bidi-controls
This is checker which forbids the use of unicode bidirectional text control characters.
These are directional formatting characters which can be used to construct text
with unexpected or unclear semantics. For example, in programming languages
which allow bidirectional text in statements, True = ייִדיש
can be written
with right-to-left reversal to mean that the variable ייִדיש
is assigned a
value of True
.
CHANGELOG
0.2.1
- Fix a typo in
forbid-bidi-controls
entrypoint
0.2.0
- Add the
forbid-bidi-controls
hook - Adjust the handling of file encodings. Files will be read with UTF-8 encoding by default in most cases.
0.1.0
- Initial release with
fix-ligatures
andfix-smartquotes
hooks
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file texthooks-0.2.1.tar.gz
.
File metadata
- Download URL: texthooks-0.2.1.tar.gz
- Upload date:
- Size: 9.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e81738210fdbab5630fec496dafac1082c819d5fdc46b6988c7e7b5f4cd53599 |
|
MD5 | d2f4db9901b4f2913cb210852950a6dc |
|
BLAKE2b-256 | bd825efb123cfd88c7f013af5b722b34c1446848c0460cb406dad29daa1b7c61 |
File details
Details for the file texthooks-0.2.1-py3-none-any.whl
.
File metadata
- Download URL: texthooks-0.2.1-py3-none-any.whl
- Upload date:
- Size: 10.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4119ea855a9a5919bddb73cc336ff1dda34a30029324fc5a717aeead595ad449 |
|
MD5 | 2e61a54dc4a17820c1129e83a62e0932 |
|
BLAKE2b-256 | 129501d24b4d2a415d6b6c2c295e29da46fe6611c442a63142c751b6aa1af609 |