Skip to main content

Detect confusable usage of unicode homoglyphs, prevent homograph attacks.

Project description

confusable_homoglyphs [doc]

This project has been adopted from the original confusable_homoglyphs by Victor Felder.

Build Status Version on PyPi Documentation Status

a homoglyph is one of two or more graphemes, characters, or glyphs with shapes that appear identical or very similar wikipedia:Homoglyph

Unicode homoglyphs can be a nuisance on the web. Your most popular client, AlaskaJazz, might be upset to be impersonated by a trickster who deliberately chose the username ΑlaskaJazz.

  • AlaskaJazz is single script: only Latin characters.

  • ΑlaskaJazz is mixed-script: the first character is a greek letter.

You might also want to avoid people being tricked into entering their password on www.microsоft.com or www.faϲebook.com instead of www.microsoft.com or www.facebook.com. Here is a utility to play with these confusable homoglyphs.

Not all mixed-script strings have to be ruled out though, you could only exclude mixed-script strings containing characters that might be confused with a character from some unicode blocks of your choosing.

  • Allo and ρττ are fine: single script.

  • AlloΓ is fine when our preferred script alias is ‘latin’: mixed script, but Γ is not confusable.

  • Alloρ is dangerous: mixed script and ρ could be confused with p.

This library is compatible with Python 3.

API documentation

Is the data up to date?

Yep.

The unicode blocks aliases and names for each character are extracted from this file provided by the unicode consortium.

The matrix of which character can be confused with which other characters is built using this file provided by the unicode consortium.

This data is stored in two JSON files: categories.json and confusables.json. If you delete them, they will both be recreated by downloading and parsing the two abovementioned files and stored as JSON files again.

History

1.0.0

Initial release.

2.0.0

  • allowed_categories renamed to allowed_aliases

2.0.1

3.0.0

Courtesy of Ryan P Kilby, via https://github.com/vhf/confusable_homoglyphs/pull/6 :

  • Changed file paths to be relative to the confusable_homoglyphs package directory instead of the user’s current working directory.

  • Data files are now distributed with the packaging.

  • Fixes tests so that they use the installed distribution instead of the local files. (Originally, the data files were erroneously showing up during testing, despite not being included in the distribution).

  • Moves the data file generation into a simple CLI. This way, users have a method for controlling when the data files are updated.

  • Since the data files are now included in the distribution, the CLI is made optional. Its dependencies can be installed with the cli bundle, eg. pip install confusable_homoglyphs[cli].

3.1.0

  • Update unicode data

3.1.1

  • Update unicode data (via ftp)

3.2.0

  • Drop support for Python 3.3

  • Fix #11: work as expected when char not found in datafiles

3.3.0

  • Drop support for Python 2

  • Drop support for Python < 3.7, add support for Python up to 3.12

  • Allow using data files from a custom location set with the CONFUSABLE_DATA environment variable.

  • Fix the return value of confusables.is_dangerous() to the documented API of a boolean value. It used to return either False or the list output of confusable.is_confusable().

  • Added a check command for command line use.

3.3.1

  • Update unicode data

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

confusable_homoglyphs-3.3.1.tar.gz (325.5 kB view details)

Uploaded Source

Built Distribution

confusable_homoglyphs-3.3.1-py2.py3-none-any.whl (144.8 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file confusable_homoglyphs-3.3.1.tar.gz.

File metadata

  • Download URL: confusable_homoglyphs-3.3.1.tar.gz
  • Upload date:
  • Size: 325.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for confusable_homoglyphs-3.3.1.tar.gz
Algorithm Hash digest
SHA256 b995001c9b2e1b4cea0cf5f3840a7c79188a8cbbad053d693572bd8c1c1ec460
MD5 b0d8f9c189827ad7292fb00d17e2aecf
BLAKE2b-256 0d101358fca1ee2d97d4f2877df9ffbe6d124da666fef3b2f75e771a4c1afee6

See more details on using hashes here.

File details

Details for the file confusable_homoglyphs-3.3.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for confusable_homoglyphs-3.3.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 84c92cb79dc7f55aa290d0762b2349abd8dee4c16fbe6f99eac978d394e2e6a1
MD5 d0e360dfe2222f73e8f0e26942b563d6
BLAKE2b-256 c56ec0fcbb7d341a46cf4241a6aa9e6a737734f0657521fc1bcd074953fe4eea

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page