Skip to main content

Session Feature Extractor

Project description

Session Feature Extractor (sfe)

Requirements:

  • Python 3.10+
  • scapy (Python package, see pyproject.toml)
  • editcap (external tool, for PCAP/PCAPNG conversion)
  • For HPC/cluster use: apptainer (or Singularity) is recommended for containerized workflows.

A Python package for extracting, reconstructing, and visualizing session-based features from network traffic (PCAP files). Designed for research and practical applications in network intrusion detection, traffic analysis, and machine learning.


Features

  • Session & Packet Extraction: Extracts sessions and packets from PCAPs, supporting TCP/IP stack and custom protocols.
  • Layer-wise Array & Header Extraction: Converts packets/sessions into numpy arrays for each protocol layer and their headers.
  • Reconstruction: Reconstructs packets and sessions from numpy arrays, enabling round-trip conversion.
  • Batch Processing CLI: Powerful command-line interface for batch extraction, filtering, and output management.
  • Visualization: Generates grayscale images from session/packet arrays for ML and visualization.
  • Flexible Mapping: Supports dynamic column mapping from CSV label files and mapping.json.
  • Multiprocessing: Efficient parallel processing for large datasets.
  • Logging: Detailed logging with Loguru.
  • Unit Tests: Robust test coverage for core extraction and reconstruction logic.

Installation

pip install .
  • Requires Python 3.10+
  • See pyproject.toml for dependencies (scapy, numpy, pandas, opencv-python, loguru, tqdm, etc.)

Quick Start

1. Extract Sessions & Features from PCAPs

python examples/extraction.py \
  --data_dir assets/sample_pcaps \
  --out_dir temp/my_output \
  --temp_dir temp/my_temp \
  --num_processes 2 \
  --write_array --write_image \
  --hours_to_subtract 3 \
  --min_labeled_pkts 5 \
  --max_labeled_pkts 100
  • PCAP/CSV pairs and a mapping.json are required in assets/sample_pcaps.
  • Output images, arrays, and CSVs will be saved in temp/my_output.

2. Demo: Session/Packet Extraction & Reconstruction

See examples/session_packet.py for a full demonstration:

  • Extracts packets and sessions from a sample PCAP
  • Converts to numpy arrays (full, per-layer, per-header)
  • Reconstructs packets and sessions from arrays
  • Generates and saves images
  • Tested with following datasets:

Sample Outputs

Below are sample images generated from the extraction pipeline, demonstrating the different representations of a network session:

Session Array:

This image shows the entire session as a 2D grayscale array, where each row represents a packet and each column represents a byte (padded as needed). Useful for ML models that operate on raw session data.

Session Array

Layer Arrays:

This image visualizes the extracted arrays for each protocol layer (e.g., Ethernet, IP, TCP) within the session. Each layer's bytes are shown separately, highlighting protocol structure.

Layer Arrays

Header Arrays:

This image displays only the header bytes for each protocol layer, excluding payloads. It is useful for focusing on protocol metadata and structure, which can be important for traffic analysis and intrusion detection.

Header Arrays


Directory Structure

  • sfe/core/packet/packet.py – Packet class, array/header extraction, reconstruction
  • sfe/core/session/session.py – Session class, aggregation, from_array
  • sfe/data/extractor.py – Main extraction pipeline, batch processing
  • examples/extraction.py – CLI entry point for batch extraction
  • examples/session_packet.py – Demo: extraction, array/image generation, reconstruction
  • assets/sample_pcaps/ – Sample PCAP/CSV pairs and mapping.json
  • docs/assets/sample_images/ – Example output images
  • temp/my_output/ – Output images, arrays, and CSVs

Mapping & Column Configuration

  • Place a mapping.json in your PCAP/CSV directory to map PCAP filenames to CSV label files.
  • The extractor dynamically reads CSV columns and applies them to the ColumnMapping dataclass for flexible workflows.

Testing

Run all unit tests:

python -m unittest discover tests

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

session_feature_extractor-0.0.1.tar.gz (40.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

session_feature_extractor-0.0.1-py3-none-any.whl (45.2 kB view details)

Uploaded Python 3

File details

Details for the file session_feature_extractor-0.0.1.tar.gz.

File metadata

File hashes

Hashes for session_feature_extractor-0.0.1.tar.gz
Algorithm Hash digest
SHA256 32a827a32e28380bc356861b8efbbf74bf79bf437295244abf1f6f454ae79d7a
MD5 795677a8e5b419a1359be84e3ba9146c
BLAKE2b-256 d7ef6176460ffc1384b02f2a41669cdc49f6ce4e239a1a419db47e11807b5fb7

See more details on using hashes here.

File details

Details for the file session_feature_extractor-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for session_feature_extractor-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fee5b2e0c47a8dee42b7cf3f8fdb64129e3bb27122127d8684f9a5d0119c8d82
MD5 71f432f44466dc7d2a6f5bd3b18bd16f
BLAKE2b-256 4fbbedaf906f93152c39ead5832209ed9e24ea3675a2ae5091115bff4e30cce0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page