Skip to main content

Extendable scalable high-performance streaming test data generator

Project description

Avalon is a extendable scalable high-performance streaming data generator that can be used to simulate the real-time input for various systems.

Installation

To install avalon with all of its dependencies yon can use pip:

pip install avalon-generator[all]

Avalon supports a lot of command-line arguments, so you probably want to enable its argcomplete support for tab completion of arguments. Just run the following command for a single use or add it to your ~/.bashrc to preserve it for the future uses:

eval "$(avalon --completion-script=bash)"

Also if you install Avalon on Ubuntu using PPA the command line auto completion will be enabled automatically.

Installation on Ubuntu

There is a PPA for Avalon which you may prefer to use if you are using Ubuntu. You can install Avalon using the PPA with the following commands:

sudo add-apt-repository ppa:mrazavi/avalon
sudo apt update
sudo apt install avalon

Usage

At the most simple from you can name a model as the command line argument of avalon and it will produce data for the specified model on the standard output. The following command uses the --textlog shortcut to generate logs similar to snort IDS:

avalon snort --textlog

Multiple models could be used at the same time. You can also see the available models by the following command:

avalon --list-models

The default output format (without --textlog) is json-lines which output a JSON document on each line. Other formats like csv is also supported. To see the supported formats you can use the --help argument and checkout the options for --output-format, or just enable auto-complete and press <tab> key to see the available options.

Besides --output-format, the output media could also be specified via --output-media. A lot of output mediums like file, http, grpc, kafka, direct insert on sql databases are also supported out of the box.

Also, the number and the rate of the outputs could be controlled via --number and --rate arguments.

For high rates, you might want to utilize your multiple CPU cores. To do so, just prefix your model name with the number of instances you want to run at the same time, e.g. 10snort to run 10 snort instances (with 10 Python processes that could utilize up to 10 CPU cores).

You can utilize multiple models at the same time. You can also provide a ratio for the output of each model, e.g. 10snort1000 5asa20. That means 10 instances of snort model and 5 instances of asa model with the ratio 1000 output for snort producers to 20 for asa producers.

The other important parameter to archived high resource utilization is by increasing the batch size by --batch-size argument.

Also, --output-writers argument determines the simultaneous writes to the output media. So if your sink is a file or a http server or any other forms of mediums that supports concurrent writes it is possible to provide --output-writers to tune the parallelism.

Here is an example that use multiple processes to write to a CSV file, 10000 items per second.

# You don't need to enter --output-media=file because
# Avalon will automatically infer it after you enter an
# argument such as --file-name
#
avalon 20snort 5asa \
    --batch-size=1000 --rate=10000 --number=1000000 --output-writers=25 \
    --output-format=headered-csv --file-name=test.csv

Avalon command line supports many more options that you could explore them via --help argument or auto-complete by pressing <tab> key in the command line.

Architecture

Avalon architecture consists of several abstractions that give it great flexibility:

Model

Each model is responsible to generate a specific kind of data. For example a model might generate data similar to logs of a specific application or appliance while another model might generate network flows or packets.

Model output is usually an unlimited iteration of Python dictionaries.

Mapping

Mappings could transform data model for a different purpose. For example one might want to use different key names in a JSON or different column names in CSV or SQL database. You can specify a chain of multiple mappings to achieve your goal.

Format

Each format (or formatter) is responsible for converting a batch of model data to a specific format, e.g. JSON or CSV.

Format output is usually a string or bytes array, although other types could also be used according to the output media.

Media

Each media is responsible for transferring the batched formatted data to a specific data sink. For example it could write data to a file or send it to a remote server via network.

Generic Extension

Generics, currently in Beta stage, are a brand new type of extensions that gives the user ultimate flexibility to modify input arguments or execute any tasks according to them.

Extension

Avalon supports third-party extensions. So, you can develop your own models, formats, etc. to generate data for your specific use cases or send them to a sink that Avalon does not support out of the box.

You can also publish your developed extensions publicly if you think they could benefit other users.

More information is available at EXTENSIONS.org.

Mappings

Although developing and running an Avalon extension is as trivial as creating a specific directory structure and running avalon command with a specific PYTHONPATH environment variable, there is an even simpler method that might comes handy when you want to use a user-defined mapping.

A mapping could modify the model output dictionary before being used by the formatter. Avalon supports a couple of useful mappings out of the box, but new mappings could also be defined in a simple Python script and passing the file path as a URL in the avalon command line.

For example, the following script if put in a mymap.py file could be used as a mapping:

# Any valid name for the class is acceptable.
class MyMap:
    def map(self, item):
        # Item is the dictionary generated by the models

        # Rename "foo" key to "bar"
        item["bar"] = item.pop("foo", None)

        item["new"] = "a whole new key value"

        # Don't forget to reutrn the item
        return item

NOTE: Despite normal extension mappings which has to inherit from a specific base class, the mappings passed as file:// URLs to avalon does not have such obligations.

Now, the mapping could be passed to Avalon with --map as a URL:

avalon --map=file:///path/to/mymap.py

Avalon also supports passing multiple --map arguments and all the provided mappings will be applied in the specified order. One particular useful use-case is to define many simple mappings and combine them do achieve the desired goal.

Also using curly braces you can pass a mapping to only a specific model when combining multiple models. Here is an example:

# mymap.py will applied to the first snort, the internal jsoncolumn
# mapping will be applied to asa and the last snort will be used
# without any mappings.
avalon "snort{file:///path/to/mymap.py} asa{jsoncolumn} snort"

Etymology

The Avalan name is based on the name of a legendary island featured in the Arthurian legend and it has nothing to do with the proprietary Spirent Avalanche traffic generator.

Authors

  • Mohammad Razavi

  • Mohammad Reza Moghaddas

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

avalon-generator-1.1.0.tar.gz (77.0 kB view hashes)

Uploaded Source

Built Distribution

avalon_generator-1.1.0-py3-none-any.whl (87.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page