Anti-typosquatting suite
Project description
Andy
Andy* is an anti-typosquatting suite created to combat Steam Tradelink & Discord Nitro fraud links. With a correct configuration, Andy will achieve a ridiculous accuracy. In my tests, Andy achieved a 100% accuracy rate (0% false-negatives) with 0% false positives.
While Andy was developed with Steam and Discord in mind, it can hypothetically used for virtually any websites.
*any resemblance to persons, whether real or ficticious, is purely coincidental.
How it works
Given a URL, called test URL henceforth, it should be checked whether or not it is a fraudulent URL given the config provided.
In essence, Andy will compute the normalized Levenshtein distance between the test URL and configured valid domains.
If this distance shows high similarity, or it contains part of a valid domain (think discord-nitro
containing discord
from discord.com
) it will flag the URL as fraudulent.
Alternatively, it will attempt to perform similar comparisons on the domain contents, URL path and URL querystrings in order to compute a final verdict.
Configuration
Andy requires a valid configuration dictionary. This will consist of the following keys:
Key | Type | Description |
---|---|---|
domain |
dict of string keys and string array values | A dictionary of all legitimate domains (2nd/3rd level) with their associated TLDs. |
domain_threshold |
[0-1] | How similar the base domain should be to trigger suspicion. |
domain_keywords |
string array | A list of fraudulent keywords in the domain. |
domain_keywords_threshold |
[0-1] | How similar the base domain should be to trigger keyword processing. |
path |
string array | A list of fraudulent keywords in the path. |
path_threshold |
[0-1] | How similar strings in the path should be to trigger suspicion. |
path_split |
boolean | Whether to split on "-" in the path and check parts individually. |
query |
string array | A list of fraudulent keywords in a querystring. |
query_threshold |
[0-1] | How similar querystrings should be to trigger suspicion. |
query_split |
boolean | Whether to split on "-" in the querystring and check parts individually. |
Note: All categories except domain_keywords
use Levenshtein distance to trigger with "similar" strings.
For example, the keyword "gift" will cause "g1ft" to trigger too.
Additionally, there must be at least one value in domain
.
Warning: Setting any of the split values to true
may considerably increase your false positive rate if misconfigured.
Example
For example, using (not all config elements are shown):
"domain": {
"discord": ["com", "gg", "gift"],
"steamcommunity": ["com"]
}
The following URLs will be evaluated as follows:
discord.com
-> No fraudd1scorrd.com
-> Frauddiscord.biz
-> Fraudsteamcommunity.com
-> No fraudstreamcommmunity.com
-> Fraud
Installation
pip install andy-fraud
Usage (library)
Andy can be used as a library for other projects.
Loading the config
First, the config needs to be loaded. The config can be stored anywhere, from purely in-memory to persisting in a JSON file. For this example, let us assume the config is in a file, and we are parsing it as JSON.
config = json.load(config_file)
Getting a parsed URL
The algorithm requires a parsed URL.
This can be obtained easily, where string
is a variable that contains our link as a string.
parsed = urllib.parse.urlparse(string)
Testing if the URL is fraudulent
This snippet demonstrates how to actually tell if a URL is fraudulent. It includes the import of the package.
from andy import suite
if suite.is_scam(config, parsed):
print("Likely a scam link!")
else:
print("Probably not a scam link!")
Usage (command line)
Andy can be run from the command line to test the accuracy of detection. For this, two files are needed. Both files contain one domain per line (tailing and leading whitespace is ignored). The first file, referred to as the legitimate file, will contain only URLs that are not considered fraud. The second file, referred to as the scam file, will contain only URLs that are fraudulent. Andy will then test how many of the URLs from the legitimate file Andy flags as fraudulent, and vice versa.
Command line arguments
As such, 3 command line arguments are required:
- The path of the config file (must be stored in JSON format).
- The path of the legitimate file.
- The path of the scam file.
Running the script
Make sure that the package is installed (see above). This is an example, your file names may differ!
General: python -m andy config.json legit.txt scam.txt
Windows: py -m andy config.json legit.txt scam.txt
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file andy_fraud-1.0.1.tar.gz
.
File metadata
- Download URL: andy_fraud-1.0.1.tar.gz
- Upload date:
- Size: 20.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3ea537b2ba63ff610af3e40ff4838e8e3b0a806ed8001896359786f138d4c7f9 |
|
MD5 | 527f4b0dde3ec170d526b806494468e9 |
|
BLAKE2b-256 | ab57ccc3c0c9d79cb4790ab5d4f5cccca73a581bb9b4459c8071a76ba2967ca5 |
File details
Details for the file andy_fraud-1.0.1-py3-none-any.whl
.
File metadata
- Download URL: andy_fraud-1.0.1-py3-none-any.whl
- Upload date:
- Size: 19.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 143e4f6d1e0c7ac607b9b15aa3e1d5ca38e89715d23b4f0b8278a0050a3e94af |
|
MD5 | 9926d1ac98bdf8d1d42d2e31ea7b4915 |
|
BLAKE2b-256 | b3dd22a49957a67ed129a6bb0a8b1a9da3d15f5b56b1ff9b3e37e508234775b1 |