Skip to main content

v2d - cli generates new variations of an input based on similarity matrix generated by v2d-similarity.

Project description

# V2D - Visual Unicode attacks with deep learning

Unlike the classic tools for generating malicious domains (typographical errors), we have created a system to detect similar domains from Unicode. This system does not have a static table with the possible changes, the domains creation are based on the similarity of the characters by means of Deep Learning. Consequently, this provides a greater number of variations and possible updates over time if new characters are created.

## State of Art

This project is based on the initial idea of capturing the differences between Unicode characters through their representation in images. In actual fact, there are some projects which use the standard of Unicode and some repositories have been created. Some interesting projects are:

* Standard:

* Personal project:

We based on these repositories to update our tool and try to get more accurate and complete results.

## Research

This tool is the result of the work of the Cybersecurity Lab I4S team within BlueIndico, where we started from the simple idea of comparing images of unicode characters. Initially, the images of all these Unicode characters were needed and this was the first problem as we could not find them on the Internet. So that, we created the first database with the unicode image characters. It can be found on the repository of Unicode images, There are 38,880 characters that we will use to compare with Basic Latins characters and select the most similar ones. This is the first public database with the images of the Unicode characters, we would like to share it with the community in order to improve the image recognition. Anyone can download the images from the repository. Any contribution to improve the algorithms for characters recognition would be appreciated.

![Image repository](/img/repository.png "Image repository.")

After having all the characters, the next step is to calculate the similarity between images of Unicode characters. To accomplish that, we used Transfer Learning with Keras. The full project is available on Github, This code extracts image features to compare and create a confusables file that it will be used by the CLI.
Finally, we created a CLI using the result of the previous step. This CLI generates all the possible combinations with each similar letter of each letter in Unicode. On the one hand, as an attacker, it can be used to generate malicious web domains, emails, phishing, etc. On the other hand, as a defender, to check how all these variations affect/impact in a web and if they exist, block them or report them as fraud to State forces.


This is the schema of the system:

![Alt text](/img/Architecture.png "Repositories system.")

## V2D - CLI

V2D is the first tool that uses Deep Learning, especially Transfer Learning, to automatically create new variations of inputs using Unicode characters. It is a typical visual attack but in this case the tool uses the power of the machines to select the most similar characters between all possibles.


### Prerequisites


### Installing

pip3 install v2d

### Getting started

#### Quick example

$ v2d -d -m 10 -c -v

oooooo oooo .oooo. oooooooooo.
`888. .8' .dP""Y88b `888' `Y8b
`888. .8' ]8P' 888 888
`888. .8' .d8P' 888 888
`888.8' .dP' 888 888
`888' .oP .o 888 d88'
`8' 8888888888 o888bood8P'

Visual Unicode attacks with Deep Learning
Version 1.1.0
Authors: José Ignacio Escribano
Miguel Hernández (MiguelHzBz)
Alfonso Muñoz (@mindcrypt)

Similar domains to
Checking if domains are up
The domain exampǀ does not exist
The domain examp1е.org does not exist
The domain examp1ɘ.org does not exist
The domain does not exist
The domain examp|е.org does not exist
The domain examp|ɘ.org does not exist
The domain exists
The domain examplе.org does not exist
The domain examp| does not exist
The domain examplɘ.org does not exist
Total similar domains to 10
##### Note

> Sometimes the output isn't render, that is because the terminal needs the font, but if you copy the text is correct.

#### Getting help

$ v2d -h

oooooo oooo .oooo. oooooooooo.
`888. .8' .dP""Y88b `888' `Y8b
`888. .8' ]8P' 888 888
`888. .8' .d8P' 888 888
`888.8' .dP' 888 888
`888' .oP .o 888 d88'
`8' 8888888888 o888bood8P'

Visual Unicode attacks with Deep Learning
Version 1.1.0
Authors: José Ignacio Escribano
Miguel Hernández (MiguelHzBz)
Alfonso Muñoz (@mindcrypt)

usage: v2d [-h] [-d DOMAIN] [-v] [-c] [-w] [-vt] [-m MAX]
[-t 75,80,85,90,95,99] [-key API] [-o OUTPUT] [-i FILEINPUT]

v2d-cli: Visual Unicode attacks with Deep Learning - System based on the
similarity of the characters unicode by means of Deep Learning. This provides
a greater number of variations and a possible update over time

optional arguments:
-h, --help show this help message and exit
-d DOMAIN, --domain DOMAIN
check similar domains to this one
-v, --verbose
-c, --check check if this domain is alive
-w, --whois check whois
-vt, --virustotal check Virus Total
-m MAX, --max MAX maximum number of similar domains
-t 75,80,85,90,95,99, --threshold 75,80,85,90,95,99
Similarity threshold
-key API, --api-key API
VirusTotal API Key
-o OUTPUT, --output OUTPUT
Output file
List of targets. One input per line.


>$ v2d -d -o dominionsexample.txt
>$ v2d --domain -m 100 -t 85
>$ v2d -i fileexample.txt -c -w -v


## Authors

* José Ignacio Escribano Pablos
* Miguel Hernández Boza - @MiguelHzBz
* Alfonso Muñoz Muñoz - @mindcrypt

## Contributing

Any collaboration is welcome!

There're many tasks to do.You can check the [Issues]( and send us a Pull Request.

## License

This project is licensed under the MIT License - see the []( file for details.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

v2d-1.1.0.tar.gz (8.2 kB view hashes)

Uploaded source

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page