A package for random string detection
Project description
Random String Detector
Instalation
pip install random-string-detector
Example
from random_string_detector.random_string_detector import is_random_string
print(is_random_string("Home", 0.1)) # False
print(is_random_string("Jdjfjfk", 0.1)) # True
Story behind
Using the fact that the expected number of 2-letter combinations in English is 676, and this includes combinations with identical letters and combinations with distinct letters, it is possible to use low-frequency bigrams in order to detect random strings of English letters.
As per Peter Norvig analysis, the most frequent bigram in English language is "th". On the other side, "zx" is not so common. By comparing the frequency of different bigrams in your text data to those in the English language corpus, you can identify strings of characters that do not fit typical language patterns.
Package contains a single method is_random_string(word, threshold). The first argument represents a string/word and the second argument represents threshold value between 0 and 100. Higher values represent more frequent bigrams (like "th") and lower values represent less frequent bigrams (like "zx").
Only strings with length greater than 4 are considered as well as strings which contain only English characters.
We happily accept any contributions and feedback. 😊
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for random_string_detector-0.0.5.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 479450035eed1c96777cbb76caa13b89446ebe973cccbe070dc960bf30472eb2 |
|
MD5 | d8497103bdd40426bf40ef50c5331ddb |
|
BLAKE2b-256 | a9603981850f4d6080f21f8fb9322183f8f4f005063e4968890bab226696fe4a |
Hashes for random_string_detector-0.0.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0f35c166662a2301df1c39e95d2518d481ec9377cad636609095833aa0430b3f |
|
MD5 | f4a278a0c0f98ce24ebfab397a360a4a |
|
BLAKE2b-256 | 12998ecaa35fa9d17be0b3b9bc5ac2b25fb789fd79608737335c5ab087e4ed48 |