Homoglyphs
Project description
Homoglyphs
Homoglyphs lives! This Python library is an important and widely used library for handling Homoglyphs in Python. This is a fork of the original orsinium maintained project.
Homoglyphs -- python library for getting homoglyphs and converting to ASCII.
Features
It's smarter version of confusable_homoglyphs:
- Autodect or manual choosing category (aliases from ISO 15924).
- Auto or manual load only needed alphabets in memory.
- Converting to ASCII.
- More configurable.
- More stable.
Installation
sudo pip install homoglyphs
Usage
Best way to explain something is show how it works. So, let's have a look on the real usage.
Importing:
import homoglyphs as hg
Languages
#detect
hg.Languages.detect('w')
# {'pl', 'da', 'nl', 'fi', 'cz', 'sr', 'pt', 'it', 'en', 'es', 'sk', 'de', 'fr', 'ro'}
hg.Languages.detect('т')
# {'mk', 'ru', 'be', 'bg', 'sr'}
hg.Languages.detect('.')
# set()
# get alphabet for languages
hg.Languages.get_alphabet(['ru'])
# {'в', 'Ё', 'К', 'Т', ..., 'Р', 'З', 'Э'}
# get all languages
hg.Languages.get_all()
# {'nl', 'lt', ..., 'de', 'mk'}
Categories
Categories -- (aliases from ISO 15924).
#detect
hg.Categories.detect('w')
# 'LATIN'
hg.Categories.detect('т')
# 'CYRILLIC'
hg.Categories.detect('.')
# 'COMMON'
# get alphabet for categories
hg.Categories.get_alphabet(['CYRILLIC'])
# {'ӗ', 'Ԍ', 'Ґ', 'Я', ..., 'Э', 'ԕ', 'ӻ'}
# get all categories
hg.Categories.get_all()
# {'RUNIC', 'DESERET', ..., 'SOGDIAN', 'TAI_LE'}
Homoglyphs
Get homoglyphs:
# get homoglyphs (latin alphabet initialized by default)
hg.Homoglyphs().get_combinations('q')
# ['q', '𝐪', '𝑞', '𝒒', '𝓆', '𝓺', '𝔮', '𝕢', '𝖖', '𝗊', '𝗾', '𝘲', '𝙦', '𝚚']
Alphabet loading:
# load alphabet on init by categories
homoglyphs = hg.Homoglyphs(categories=('LATIN', 'COMMON', 'CYRILLIC')) # alphabet loaded here
homoglyphs.get_combinations('гы')
# ['rы', 'гы', 'ꭇы', 'ꭈы', '𝐫ы', '𝑟ы', '𝒓ы', '𝓇ы', '𝓻ы', '𝔯ы', '𝕣ы', '𝖗ы', '𝗋ы', '𝗿ы', '𝘳ы', '𝙧ы', '𝚛ы']
# load alphabet on init by languages
homoglyphs = hg.Homoglyphs(languages={'ru', 'en'}) # alphabet will be loaded here
homoglyphs.get_combinations('гы')
# ['rы', 'гы']
# manual set alphabet on init # eng rus
homoglyphs = hg.Homoglyphs(alphabet='abc абс')
homoglyphs.get_combinations('с')
# ['c', 'с']
# load alphabet on demand
homoglyphs = hg.Homoglyphs(languages={'en'}, strategy=hg.STRATEGY_LOAD)
# ^ alphabet will be loaded here for "en" language
homoglyphs.get_combinations('гы')
# ^ alphabet will be loaded here for "ru" language
# ['rы', 'гы']
You can combine categories
, languages
, alphabet
and any strategies as you want. The strategies specify how to handle any characters not already loaded:
STRATEGY_LOAD
: load category for this characterSTRATEGY_IGNORE
: add character to resultSTRATEGY_REMOVE
: remove character from result
Converting glyphs to ASCII chars
homoglyphs = hg.Homoglyphs(languages={'en'}, strategy=hg.STRATEGY_LOAD)
# convert
homoglyphs.to_ascii('ТЕСТ')
# ['TECT']
homoglyphs.to_ascii('ХР123.') # this is cyrillic "х" and "р"
# ['XP123.', 'XPI23.', 'XPl23.']
# string with chars which can't be converted by default will be ignored
homoglyphs.to_ascii('лол')
# []
# you can set strategy for removing not converted non-ASCII chars from result
homoglyphs = hg.Homoglyphs(
languages={'en'},
strategy=hg.STRATEGY_LOAD,
ascii_strategy=hg.STRATEGY_REMOVE,
)
homoglyphs.to_ascii('лол')
# ['o']
# also you can set up range of allowed char codes for ascii (0-128 by default):
homoglyphs = hg.Homoglyphs(
languages={'en'},
strategy=hg.STRATEGY_LOAD,
ascii_strategy=hg.STRATEGY_REMOVE,
ascii_range=range(ord('a'), ord('z')),
)
homoglyphs.to_ascii('ХР123.')
# ['l']
homoglyphs.to_ascii('хр123.')
# ['xpl']
The Fork
To help with the transition I have:
- Moved the
main
branch - Enabled Issues
I am looking to:
- Switch to using GitHub Actions
- Add this fork to PyPI
- Update orsinium's page to say it's maintained
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file homoglyphs_fork-2.0.4.tar.gz
.
File metadata
- Download URL: homoglyphs_fork-2.0.4.tar.gz
- Upload date:
- Size: 126.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.26.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 08994ebafbd36caf29d1f5371eebc7ab9cdd9f745ab25f69993e411c2b09663d |
|
MD5 | d162d2e6fbef81c6e1f4c9818a0142ef |
|
BLAKE2b-256 | 2d194bd521b57b3c97ab55247aa2df37b340d9219998c12e169f51cafb0aa900 |
File details
Details for the file homoglyphs_fork-2.0.4-py2.py3-none-any.whl
.
File metadata
- Download URL: homoglyphs_fork-2.0.4-py2.py3-none-any.whl
- Upload date:
- Size: 88.1 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.26.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f5971c0e7ad618ecd20d07a718a74f82aeaa4f46ba4529066b5f196c2cb9326c |
|
MD5 | cb9ae24f94b29277cd02d361b7e75422 |
|
BLAKE2b-256 | 64197f48587ed774bfa6dcb5f5991ff83f73282694773ae95d859af28ee0968d |