Searches for [ab]using of Unicode glyphs
Project description
DirtyText
Searches for [ab]using of Unicode glyphs.
Installation
DirtyText package can be installed through pip :snake: :
$ pip install dirtytext
or downloaded from GitHub.
Quick tour:
Common options:
- Read from file: -f <filename>
- Save modified text: -s <file>
- Text filter: --filter
- Pipeline mode: -p
:mag_right: Looks for ZERO-WIDTH characters:
$> echo "This text contains zero-width chars" | dirtytext --zero -v
will produce the following output:
Contains zero-width characters: True
JSON:
[{"idx": 0, "char": "\ufeff", "cval": "FEFF", "infos": null},
{"idx": 10, "char": "\u200c", "cval": "200C", "infos": null},
{"idx": 11, "char": "\u200c", "cval": "200C", "infos": null}, ...]
:mag_right: Looks for CONFUSABLES characters:
$> echo "hello" | dirtytext --confusables greek -v
will produce the following output:
Contains confusables characters: True
JSON:
[{"idx": 2, "char": "l", "cval": "006C", "infos": [{"target": "0399", "description": "GREEK CAPITAL LETTER IOTA"}]},
{"idx": 3, "char": "l", "cval": "006C", "infos": [{"target": "0399", "description": "GREEK CAPITAL LETTER IOTA"}]},
{"idx": 4, "char": "o", "cval": "006F", "infos": [{"target": "03BF", "description": "GREEK SMALL LETTER OMICRON"},
{"target": "03C3", "description": "GREEK SMALL LETTER SIGMA"}]}]
:mag_right: Looks and filter anomalies in LATIN text:
example.txt:
It ⅽan be argueⅾ that the ⅽomputer ⅰs humanⅰty’s attempt to repⅼⅰⅽate the human brain.
This ⅰs perhaps an unattainable goal.
However, unattainable goals often lead to outstanding accomplishment.
$> dirtytext -f example.txt --lsubs --filter -s out.txt
out.txt:
It can be argued that the computer is humanity’s attempt to replicate the human brain.
This is perhaps an unattainable goal.
However, unattainable goals often lead to outstanding accomplishment.
UnicodeDB
The unicode data that composes dirtytext database are extracted from unicode consortium, in particular there are two database files into dirtytext/data directory:
- categories.json: built from data extracted from here
- confusables.json: built from data extracted from here
If dirtytext/data doesn't exist, DT downloads and build database before performing the required operations, after which you can force the database update by adding the --update option
License
Released under GPL-3.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dirtytext-1.0.0.tar.gz
(111.4 kB
view hashes)