Skip to main content

A tool for processing pcap files in parallel

Project description

pcap-parallel: split and parallel process a PCAP file

About

This package enables processing a PCAP file (for example, one produced by tcpdump) to be processed in parallel on a multi-core machine. It achieves this by reading the entire file using the dpkt module to scan only each packet as quickly as possible to identify where the packet boundaries are within the file. Then sections of the file are loaded into io.ByteIO sections and handed to callback routines for processing. The callback routines are spun off as separate processes, enabling them to do deeper (aka slower) packet processing.

Code: https://github.com/hardaker/pcap-parallel

Notes

  • It loads the entire file into memory!! You've been warned.
    • TODO: offer using file pointers instead with a length to read
  • Your code must not care about packet ordering since parts of the module will be processed by one function, and a future part by another even though a TCP stream or similar might be split across the multiple calls.
  • It will attempt to calculate a split size and a maximum_cores value for you, but it will not do a good job (especially on compressed files). You may (should) specify your own values when creating a class.
  • This will not a huge speed benefit if you aren't doing fairly complex processing (the example below only does minimal processing). If you're using something like scapy, though, it will definitely help.
  • It can handle compressed files (gzip, bz2, and xz) assuming you have the needed decompression modules installed.
  • It returns a list of Future objects, so make sure to call .result() on each item in the list in order to ensure you get the actual results from your callback.
  • Because the results are run within a separate process, the contents to return from each callback should be pickleable.

Installation

pip install pcap-parallel

Usage

The following example uses the dpkt module to count all the source IP addresses seen in a PCAP file and display the results. Note that this is not super intensive processing, but at least demonstrates how the module should work.

import dpkt
import ipaddress
from pcap_parallel import PCAPParallel
from collections import Counter

def process_partial_pcap(file_handle):
    """Process a chunk of a larger PCAP file

    Note: this will be launched multiple times in separate processes"""

    # store counters of data
    srcs = Counter()

    # read the pcap in and count all the sources
    pcap = dpkt.pcap.Reader(file_handle)
    for timestamp, packet in pcap:
        eth = dpkt.ethernet.Ethernet(packet)
        if isinstance(eth.data, dpkt.ip.IP):
            try:
                ip = eth.data
                srcs[str(ipaddress.ip_address(ip.src))] += 1
            except Exception:
                pass

    return srcs

ps = PCAPParallel(
    "test.pcap",
    callback=process_partial_pcap,
)
partial_results = ps.split()

# merge the results
total_counts = partial_results.pop(0).result()
for partial in partial_results:
    next_counts = partial.result()
    for key in next_counts:
        total_counts[key] += next_counts[key]

# print the results
for key in total_counts:
    print(f"{key:<30} {str(total_counts[key]):>8}")

License

See the [./LICENSE] file for the details of the Apache 2.0 license.

Author

Wes Hardaker USC/ISI https://www.isi.edu/~hardaker

Acknowledgments

This module is a spin-off of a larger research project of Wes Hardaker's at USC/ISI that is funded by Comcast. We thank Comcast for their support in making this module possible.

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pcap-parallel-0.1.tar.gz (8.8 kB view details)

Uploaded Source

Built Distribution

pcap_parallel-0.1-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file pcap-parallel-0.1.tar.gz.

File metadata

  • Download URL: pcap-parallel-0.1.tar.gz
  • Upload date:
  • Size: 8.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.12.0

File hashes

Hashes for pcap-parallel-0.1.tar.gz
Algorithm Hash digest
SHA256 a6c35c871ef65609cb2b2f4a8801266c92c250ceb72aaf3606d4693c9a77181e
MD5 9f258c119613157790cd77d9a63a655d
BLAKE2b-256 49a43474d77121b8ef4749a4e581198ce8f600f010a147e82d0524bb91300cd7

See more details on using hashes here.

File details

Details for the file pcap_parallel-0.1-py3-none-any.whl.

File metadata

  • Download URL: pcap_parallel-0.1-py3-none-any.whl
  • Upload date:
  • Size: 9.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.12.0

File hashes

Hashes for pcap_parallel-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 52acd5314ccdaafdf76e1d476f3a52bbabb6b01c28e15b4df0e016c87b2cec1a
MD5 39cdf671cf96d86c2768d37db7303afd
BLAKE2b-256 2bbe181fe4dca7dd69160af586550dc9009e29d5bd955cad91ea0e2135d59c31

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page