Skip to main content

Extremely fast and accurate predictions of whether a domain name is genuine or DGA with deep learning.

Project description

DGA Intel

Using deep learning to detect DGA domains.


The DGAIntel Python module allows you to utilize a powerful CNN-LSTM model to predict whether a given domain name was generated by a domain generation algorithm (DGA) or corresponds to a genuine domain. The prediction features are also accesible through this website, but this package allows for direct integration into your workflow.


DGAIntel is designed for use with Python 3. It has only two requirements:

- TensorFlow 2.x
- Numpy


To download dgaintel, simply use Pypi via pip.

$ pip install dgaintel

Alternatively, you could install from source.

$ git clone
$ cd dgaintel
$ python install

Verify your installation by running

>>> import dgaintel
>>> dgaintel.get_prediction('')
' is genuine with probability 0.00050'


Predict DGA

This is simple way of determining whether any given domain, such as '' is DGA or not, mainly intended for cyber security analysts.

from dgaintel import get_prediction


' is genuine with probability 0.00050'

Predict DGA probability

This allows for getting the probability, or probabilities, that a domain or list of domains is DGA or not, which is more useful to data scientists.

from dgaintel import get_prob

# For single domain
prob = get_prob('')

# For multiple domains
probs = get_prob(['', '', ''])

# To get just the scores
raw_probs = list(get_prob(['', '', '']))


[('', 0.00050), ('', 0.00033), ('', 0.97601)]

[0.00050845, 0.00033092, 0.00144754]

Predict by file

This is for inputing a file containing a list of domains to get predictions on all of them at once, which is helpful for data analysts.

Say you have a domain file domains.txt.

Then, you can run the following code in the same directory.

from dgaintel import get_prediction

# Print to console

# Write to file
get_prediction('domains.txt', to_file='domain_predictions.txt') is genuine with probability 0.00050 is genuine with probability 0.00033 is DGA with probability 0.97601

If you read the new file domain_predictions.txt, you will see the following. is genuine with probability 0.0005084535223431885 is genuine with probability 0.00033092446392402053 is DGA with probability 0.9760094285011292

Prediction analysis

This is an example function that integrates dgaintel with whois for performing basic prediction analysis, which is important for cyber security investigators.

from dgaintel import get_prob
from whois import query

def analyze(domain, print=True):
    prob = get_prob(domain)
    whois = query(domain)
    dga = False
    if prob >= 0.5: dga = True

    domain_analysis = {'domain_name': domain, 'dga': dga, 'registrar': whois.registrar, 'creation date' : whois.creation_date, 'expiration date': whois.expiration_date}

    if print:
        for key, val in items(domain_analysis):
            print('{}: {}'.format(key, val))
        return None
    return domain_analysis


# Get analysis dictionary in python itself
analysis = analyze('', print=False)


dga: False

registrar: MarkMonitor Inc.

creation date: 1991-05-02 04:00:00

expiration date: 2021-05-03 04:00:00


DGAIntel has support for polymorphism; to input domains to run predictions on, you can use a single domain name, a list of domain names, or a text file with line-separated domain names. The text file has the format

Additionally, the Tensorflow Keras model running in the backend supports input batching, meaning there is a significant increase in speed for running predictions on lists or files rather than individual domains. This was tested in Jupyter.

from dgaintel import get_prob

# List of 10 domain names
l = ['', '', '', '', '', '', '', '', '', '']
# One domain

286 ms ± 4.99 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# Ten domains

290 ms ± 7.23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# Hundred domains

333 ms ± 4.71 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# Thousand domains

584 ms ± 14.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

This demonstrates that increasing the number of domain names one runs the prediction by 1000x only increases the inference time by less than 2x. Therefore, this model is easily adaptable to large-scale predictions.


The get_prediction function will either print the predictions or write them to a user-specified file.

from dgaintel import get_prediction

get_prediction(['', '', ''])
get_prediction('domains.txt', to_file='domain_predictions.txt')

The get_prob function will perform the inference and provide the prediction floats. It is helpful if you want to use the prediction scores directly in your workflow.

from dgaintel import get_prob

get_prob('') # 0.00050851
get_prob(['', '', '']) # [('', 0.00050), ('', 0.00033), ('', 0.0.97601)]
get_prob('domains.txt') # [('', 0.00050), ('', 0.00033), ('', 0.97601)]
get_prob(['', '', ''], raw=True) # array([0.00050, 0.00033, 0.0.97601], dtype=float32)

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for dgaintel, version 1.0
Filename, size File type Python version Upload date Hashes
Filename, size dgaintel-1.0.tar.gz (1.2 MB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page