Skip to main content

Homoglyphs

Project description

Homoglyphs

Homoglyphs lives! This Python library is an important and widely used library for handling Homoglyphs in Python. This is a fork of the original orsinium maintained project.

Homoglyphs logo Test Homoglyphs PyPI version Status Code size License

Homoglyphs -- python library for getting homoglyphs and converting to ASCII.

Features

It's smarter version of confusable_homoglyphs:

  • Autodect or manual choosing category (aliases from ISO 15924).
  • Auto or manual load only needed alphabets in memory.
  • Converting to ASCII.
  • More configurable.
  • More stable.

Installation

sudo pip install homoglyphs_fork

Usage

Best way to explain something is show how it works. So, let's have a look on the real usage.

Importing:

import homoglyphs_fork as hg

Languages

#detect
hg.Languages.detect('w')
# {'pl', 'da', 'nl', 'fi', 'cz', 'sr', 'pt', 'it', 'en', 'es', 'sk', 'de', 'fr', 'ro'}
hg.Languages.detect('т')
# {'mk', 'ru', 'be', 'bg', 'sr'}
hg.Languages.detect('.')
# set()

# get alphabet for languages
hg.Languages.get_alphabet(['ru'])
# {'в', 'Ё', 'К', 'Т', ..., 'Р', 'З', 'Э'}

# get all languages
hg.Languages.get_all()
# {'nl', 'lt', ..., 'de', 'mk'}

Categories

Categories -- (aliases from ISO 15924).

#detect
hg.Categories.detect('w')
# 'LATIN'
hg.Categories.detect('т')
# 'CYRILLIC'
hg.Categories.detect('.')
# 'COMMON'

# get alphabet for categories
hg.Categories.get_alphabet(['CYRILLIC'])
# {'ӗ', 'Ԍ', 'Ґ', 'Я', ..., 'Э', 'ԕ', 'ӻ'}

# get all categories
hg.Categories.get_all()
# {'RUNIC', 'DESERET', ..., 'SOGDIAN', 'TAI_LE'}

Homoglyphs

Get homoglyphs:

# get homoglyphs (latin alphabet initialized by default)
hg.Homoglyphs().get_combinations('q')
# ['q', '𝐪', '𝑞', '𝒒', '𝓆', '𝓺', '𝔮', '𝕢', '𝖖', '𝗊', '𝗾', '𝘲', '𝙦', '𝚚']

Alphabet loading:

# load alphabet on init by categories
homoglyphs = hg.Homoglyphs(categories=('LATIN', 'COMMON', 'CYRILLIC'))  # alphabet loaded here
homoglyphs.get_combinations('гы')
# ['rы', 'гы', 'ꭇы', 'ꭈы', '𝐫ы', '𝑟ы', '𝒓ы', '𝓇ы', '𝓻ы', '𝔯ы', '𝕣ы', '𝖗ы', '𝗋ы', '𝗿ы', '𝘳ы', '𝙧ы', '𝚛ы']

# load alphabet on init by languages
homoglyphs = hg.Homoglyphs(languages={'ru', 'en'})  # alphabet will be loaded here
homoglyphs.get_combinations('гы')
# ['rы', 'гы']

# manual set alphabet on init      # eng rus
homoglyphs = hg.Homoglyphs(alphabet='abc абс')
homoglyphs.get_combinations('с')
# ['c', 'с']

# load alphabet on demand
homoglyphs = hg.Homoglyphs(languages={'en'}, strategy=hg.STRATEGY_LOAD)
# ^ alphabet will be loaded here for "en" language
homoglyphs.get_combinations('гы')
# ^ alphabet will be loaded here for "ru" language
# ['rы', 'гы']

You can combine categories, languages, alphabet and any strategies as you want. The strategies specify how to handle any characters not already loaded:

  • STRATEGY_LOAD: load category for this character
  • STRATEGY_IGNORE: add character to result
  • STRATEGY_REMOVE: remove character from result

Converting glyphs to ASCII chars

homoglyphs = hg.Homoglyphs(languages={'en'}, strategy=hg.STRATEGY_LOAD)

# convert
homoglyphs.to_ascii('ТЕСТ')
# ['TECT']
homoglyphs.to_ascii('ХР123.')  # this is cyrillic "х" and "р"
# ['XP123.', 'XPI23.', 'XPl23.']

# string with chars which can't be converted by default will be ignored
homoglyphs.to_ascii('лол')
# []

# you can set strategy for removing not converted non-ASCII chars from result
homoglyphs = hg.Homoglyphs(
    languages={'en'},
    strategy=hg.STRATEGY_LOAD,
    ascii_strategy=hg.STRATEGY_REMOVE,
)
homoglyphs.to_ascii('лол')
# ['o']

# also you can set up range of allowed char codes for ascii (0-128 by default):
homoglyphs = hg.Homoglyphs(
    languages={'en'},
    strategy=hg.STRATEGY_LOAD,
    ascii_strategy=hg.STRATEGY_REMOVE,
    ascii_range=range(ord('a'), ord('z')),
)
homoglyphs.to_ascii('ХР123.')
# ['l']
homoglyphs.to_ascii('хр123.')
# ['xpl']

The Fork

To help with the transition I have:

  • Moved the main branch
  • Enabled Issues

I am looking to:

Contributors

With thanks to:

  • @wesinator
  • @clydejallorina

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

homoglyphs_fork-2.1.0.tar.gz (87.3 kB view details)

Uploaded Source

Built Distribution

homoglyphs_fork-2.1.0-py2.py3-none-any.whl (87.4 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file homoglyphs_fork-2.1.0.tar.gz.

File metadata

  • Download URL: homoglyphs_fork-2.1.0.tar.gz
  • Upload date:
  • Size: 87.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for homoglyphs_fork-2.1.0.tar.gz
Algorithm Hash digest
SHA256 26558fce85a72d42006df43316a0b973c7f62acb28cf3bb4d60272f696f00bee
MD5 942f8505cc2b2f39366e392f5ae43cc3
BLAKE2b-256 7297765057be20bddbeba76d8a6f6f288fb70be44d644fc18c95b7d7cf35c113

See more details on using hashes here.

File details

Details for the file homoglyphs_fork-2.1.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for homoglyphs_fork-2.1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 47645cc0cdfd6912e2c032500960e05d29971f452a76dbddf5b76afaaf22134d
MD5 7ee71fe4b541db2a3228e560e4bd40cf
BLAKE2b-256 7c5b9f0d118d97a514611d42b4cd6bfe29502447756c2d10e622e68c1fe1cb8d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page