Skip to main content

A tool to convert network traffic into images for ML use cases.

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

heiFIP Logo


heiFIP stands for Heidelberg Flow Image Processor. It is a tool designed to extract essential parts of packets and convert them into images for deep learning purposes. heiFIP supports different formats and orientations. Currently, we only support offline network data analysis. However, we plan to adapt our library to support online network data too to enable live-probing of models.

Live Notebook live notebook
Latest Release latest release
Supported Versions python3 pypy3
Project License License
Continuous Integration Linux WorkFlows MacOS WorkFlows Windows WorkFlows

Table of Contents

Motivation

The idea to create heiFIP came from working with Deep Learning approaches to classify malware traffic on images. Many papers use image representation of network traffic, but reproducing their results was quite cumbersome. As a result, we found that there is currently no official library that supports reproducible images of network traffic. For this reason, we developed heiFIP to easily create images of network traffic and reproduce ML/DL results. Researchers can use this library as a baseline for their work to enable other researchers to easily recreate their findings.

Main Features

  • Different Images: Currently, we support plain packet to byte representation, and flow to byte representation with one channel each. An image is created with same width and height for a quadratic representation.
    • Flow Images converts a set of packets into an image. It supports the following modifications:
      • Max images dimension allows you to specify the maximum image dimension. If the packet is larger than the specified size, it will cut the remaining pixel.
      • Min image dimesion allows you to specify the minimum image dimension. If the packet is smaller than the specified size, it fills the remaining pixel with 0.
      • Remove duplicates allows you to automatically remove same traffic.
      • Append each flow to each other or write each packet to a new row.
      • Tiled each flow is tiled into a square image representation.
      • Min packets per flow allows you to specify the minimum number of packets per flow. If the total number of packets is too small, no image will be created.
      • Max packets per flow allows you to specify the maximum number of packets per flow. If the total number of packets is too great, the remaining images are discarded.
    • Packet Image converts a single packet into an image.
    • Markov Transition Matrix Image: converts a packet or a flow into a Markov representation.
  • Header processing allows you to customize header fields of different protocols. It aims to remove biasing fields. For more details look into header.py
  • Remove Payload options allows you to only work on header data.
  • Fast and flexible: We rely on Scapy for our sniffing and header processing. Image preparation is based on raw bytes.
  • Machine learning orientation: heiFIP aims to make Deep Learning approaches using network data as images reproducible and deployable. Using heiFIP as a common framework enables researches to test and verify their models.

Examples

Image Type Description Example
Packet Converts a single packet into a square image. Size depends on the total length SMB Connection
Flow Converts a flow packet into a square image SMB Connection
Markov Transition Matrix Packet Converts a packet into a Markov Transition Matrix. Size is fixed to 16x16. SMB Connection
Markov Transition Matrix Flow Converts a flow into a Markov Transition Matrix. It squares the image based on the number of packets SMB Connection

Getting Started

Install our package using PyPi

pip install heifip

Now, you can use the integrate CLI:

> fip
Usage: fip [OPTIONS] COMMAND [ARGS]...

Options:
  --version   Show the version and exit.
  -h, --help  Show this message and exit.

Commands:
  extract

To extract images from PCAPs, we currently split the command into flow and packet:

> fip extract
Starting FlowImageProcessor CLI
Usage: fip extract [OPTIONS] COMMAND [ARGS]...

Options:
  -h, --help  Show this message and exit.

Commands:
  flow
  packet

# Show help information
> fip extract [flow/packet]-h
Starting FlowImageProcessor CLI
Usage: fip extract flow [OPTIONS]

Options:
  -w, --write PATH            Destination file path, stores result  [required]
  -r, --read PATH             [required]
  -t, --threads INTEGER       Number of parallel threads that can be used
                              [default: 4]
  --preprocess [NONE|HEADER]  Applies a preprocessing to the input data: none:
                              No preprocessing payload: Only payload data is
                              used header: Preprocesses headers
                              (DNS,HTTP,IP,IPv6,TCP,UDP supported) to remove
                              some biasing data  [default: NONE]
  --min_im_dim INTEGER        Minimum dim ouput images need to have, 0=No
                              minimum dim  [default: 0]
  --max_im_dim INTEGER        Maximum dim ouput images can have, 0=No maximum
                              dim  [default: 0]
  --remove_duplicates         Within a single output folder belonging to a
                              single input folder no duplicate images will be
                              produced if two inputs lead to the same image
  --min_packets INTEGER       Minimum packets that a FlowImage needs to have,
                              0=No minimum packets per flow  [default: 0]
  --max_packets INTEGER       Minimum packets that a FlowImage needs to have,
                              0=No minimum packets per flow  [default: 0]
  --append
  --tiled
  --width INTEGER             [default: 128]
  -h, --help                  Show this message and exit.

> fip extract flow -r /PATH/PCAPs -w /PATH/IMAGES

Import FIPExtractor to run it inside your program:

extractor = FIPExtractor()
img = extractor.create_image('./test/pcaps/dns/dns-binds.pcap')
extractor.save_image(img, './test/pcaps/dns/dns-binds.pcap')

Building from source

Simply run:

pip install .

Publications that use heiFIP

  • [A Generalizable Approach for Network Flow Image Representation for Deep Learning] - CSNet 23
  • [Explainable artificial intelligence for improving a session-based malware traffic classification with deep learning] - SSCI 23

Credits

NFStream for the inspiration of the README.md and workflow testing.

Authors

The following people contributed to heiFIP:

License

This project is licensed under the EUPL-1.2 License - see the License file for details

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

heifip-1.1.1.tar.gz (32.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

heifip-1.1.1-py3-none-any.whl (36.1 kB view details)

Uploaded Python 3

File details

Details for the file heifip-1.1.1.tar.gz.

File metadata

  • Download URL: heifip-1.1.1.tar.gz
  • Upload date:
  • Size: 32.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for heifip-1.1.1.tar.gz
Algorithm Hash digest
SHA256 ac1e75993dc3e7823afe72faaa9e8b6bd80741e058af46462e08d8b3c7787fd8
MD5 56e7f0b4fc407190608ed09cc718a8c0
BLAKE2b-256 4eef922b6b6da62a307cda3498c146e55ca4c1057f9eff0a4c02c3989f7a8047

See more details on using hashes here.

File details

Details for the file heifip-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: heifip-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 36.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for heifip-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 38dd83515b435c82e6c2f2df6265bdd124af94998fcea1830320b1b559b1f16d
MD5 6e3d480318f63356aa7527ed6215b20d
BLAKE2b-256 4f0d685dadba58168f0de9050f463c420953fa733cc93ca19082d67860b7bfbf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page