Skip to main content

PCAP → ML tensor extraction for network intrusion detection research.

Project description

pcap2tensor

PCAP → ML tensor extraction for network intrusion detection research.

A fast, streaming, production-grade Python library for turning raw packet captures into training-ready tensors. Built for NIDS researchers tired of rolling their own extraction pipeline for every paper.

PyPI Python License: MIT arXiv CI

Why this exists

Every ML-based network intrusion detection paper reinvents the same pipeline:

  1. Parse a PCAP
  2. Extract per-packet features (size, inter-arrival time, direction, TCP flags, ...)
  3. Slide a window across the sequence
  4. Save as a tensor

Every implementation is a one-file script that doesn't handle PCAPs larger than RAM, doesn't expose clean extension points for custom features, and quietly crashes on the first malformed packet. pcap2tensor is that pipeline, packaged properly — streaming, extensible, and published on PyPI.

Install

pip install pcap2tensor

Python ≥ 3.9. Depends on Scapy, PyTorch, NumPy, tqdm.

Quickstart

from pcap2tensor import extract

tensor = extract("capture.pcap", features="aegis-6d", window_size=1000, stride=500)
print(tensor.shape)     # torch.Size([num_windows, 1000, 6])

Feed straight into any sequence model — Transformer, LSTM, SSM, CNN.

Large PCAPs

Streaming chunked processing — never loads the full PCAP into memory:

from pcap2tensor import PCAPExtractor

extractor = PCAPExtractor(
    features="aegis-6d",
    window_size=1000,
    stride=500,
    chunk_size=2_000_000,   # flush every 2M packets
)

# Option A: save chunked .pt files
extractor.save("massive.pcap", output_dir="./tensors/")

# Option B: stream chunks into your training loop
for chunk in extractor.extract_chunks("massive.pcap"):
    train_step(chunk)

Parallel batch

from pcap2tensor import batch_extract

batch_extract("./pcaps/", output_dir="./tensors/", features="aegis-6d", workers=8)

From the CLI:

pcap2tensor batch ./pcaps/ -o ./tensors/ -n 8

Feature presets

Preset Dim Features
basic-3d 3 size, IAT, direction
aegis-6d 6 size, IAT, direction, TCP window, TCP flags, payload ratio
extended-10d 10 aegis-6d + protocol one-hot (TCP/UDP/ICMP/other)
full-13d 13 extended-10d + destination port category (well-known/registered/dynamic)

The aegis-6d preset matches the feature set in AEGIS (Ferrel, 2026) — a TVD-HL-SSM architecture achieving F1 0.9952 on encrypted traffic detection at 262 μs inference latency.

Custom features

A Feature is any stateful callable returning a float or a flat list of floats. Subclass Feature, implement __call__, optionally override reset if you hold state:

import math
from collections import Counter
from scapy.layers.inet import TCP
from pcap2tensor import PCAPExtractor, Feature, Size, IAT, Direction


class PayloadEntropy(Feature):
    name = "payload_entropy"
    dim = 1

    def __call__(self, pkt):
        payload = bytes(pkt[TCP].payload) if TCP in pkt else b""
        if not payload:
            return 0.0
        counts = Counter(payload)
        n = len(payload)
        return -sum((c / n) * math.log2(c / n) for c in counts.values()) / 8.0


extractor = PCAPExtractor(
    features=[Size(), IAT(), Direction(), PayloadEntropy()],
)
tensor = extractor.extract("capture.pcap")

Return a list[float] and set dim accordingly for multi-valued features (e.g. one-hots).

CLI

# Single PCAP
pcap2tensor extract capture.pcap -o ./tensors/

# Parallel batch over a directory
pcap2tensor batch ./pcaps/ -o ./tensors/ -n 8

# List presets
pcap2tensor presets

# Override everything
pcap2tensor extract capture.pcap -f extended-10d -w 2000 -s 1000 -c 5000000

Design

Concern How it's handled
Memory Streaming PcapReader, chunked flush every chunk_size packets
Malformed packets Caught per-packet, silently skipped — a 4-hour run doesn't die on one pkt
Flow state Per-Feature instance, auto-reset between PCAPs
Parallelism ProcessPoolExecutor for batch mode
IPv6 First-class (IPv6 src/dst, port extraction, protocol number)
Reproducibility Same PCAP + same config = bit-identical tensor output
Output format PyTorch .pt on disk, torch.Tensor in memory

Performance

Rough single-core throughput with aegis-6d on a modern x86 machine: roughly 50–120k packets/sec, TCP-heavy captures slower than UDP-heavy. With 8 workers in batch mode, processing 100 GB+ of PCAPs per hour is achievable.

Your bottleneck is Scapy parsing, not feature extraction.

Output shape

Every extractor produces tensors of shape:

(num_windows, window_size, feature_dim)

where feature_dim = sum(f.dim for f in features). For aegis-6d, that's 6.

Citation

If you use this library in research, please cite the companion paper:

@article{ferrel2026aegis,
  title   = {AEGIS: Adversarial Entropy-Guided Immune System --
             Thermodynamic State Space Models for Zero-Day Network
             Evasion Detection},
  author  = {Ferrel, Vickson},
  journal = {arXiv preprint arXiv:2604.02149},
  year    = {2026},
  url     = {https://arxiv.org/abs/2604.02149}
}

License

MIT © Vickson Ferrel — Vixero Technology Enterprise


Built in Sarawak. For network defenders everywhere. 🛡️

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pcap2tensor-0.1.0.tar.gz (18.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pcap2tensor-0.1.0-py3-none-any.whl (15.4 kB view details)

Uploaded Python 3

File details

Details for the file pcap2tensor-0.1.0.tar.gz.

File metadata

  • Download URL: pcap2tensor-0.1.0.tar.gz
  • Upload date:
  • Size: 18.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for pcap2tensor-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8c77df4db9d227170ead5fc70aa0a872730e3951b1bddf3fb106023662058d24
MD5 38be13305589d49f39a0732d8dad2543
BLAKE2b-256 0e92dae5d4bbf070af585dc916a2098cfc488b39414c31d30680290cff617dc5

See more details on using hashes here.

File details

Details for the file pcap2tensor-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pcap2tensor-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for pcap2tensor-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e987b468ae36da6f92e497715bef85461ef39281d081e195845cbfd54e7268d3
MD5 1a1cbe41bcdd4d102705e08b4c0916ca
BLAKE2b-256 0988ad230ba55a16125c5a38fb616514716bfde4c9654e1f1145fd4b02a9e270

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page