Session Feature Extractor
Project description
Session Feature Extractor (sfe)
Requirements:
A Python package for extracting, reconstructing, and visualizing session-based features from network traffic (PCAP files). Designed for research and practical applications in network intrusion detection, traffic analysis, and machine learning.
Features
- Session & Packet Extraction: Extracts sessions and packets from PCAPs, supporting TCP/IP stack and custom protocols.
- Layer-wise Array & Header Extraction: Converts packets/sessions into numpy arrays for each protocol layer and their headers.
- Reconstruction: Reconstructs packets and sessions from numpy arrays, enabling round-trip conversion.
- Batch Processing CLI: Powerful command-line interface for batch extraction, filtering, and output management.
- Visualization: Generates grayscale images from session/packet arrays for ML and visualization.
- Flexible Mapping: Supports dynamic column mapping from CSV label files and mapping.json.
- Multiprocessing: Efficient parallel processing for large datasets.
- Logging: Detailed logging with Loguru.
- Unit Tests: Robust test coverage for core extraction and reconstruction logic.
Installation
pip install .
- Requires Python 3.10+
- See
pyproject.tomlfor dependencies (scapy, numpy, pandas, opencv-python, loguru, tqdm, etc.)
Quick Start
1. Extract Sessions & Features from PCAPs
python examples/extraction.py \
--data_dir assets/sample_pcaps \
--out_dir temp/my_output \
--temp_dir temp/my_temp \
--num_processes 2 \
--write_array --write_image \
--hours_to_subtract 3 \
--min_labeled_pkts 5 \
--max_labeled_pkts 100
- PCAP/CSV pairs and a
mapping.jsonare required inassets/sample_pcaps. - Output images, arrays, and CSVs will be saved in
temp/my_output.
2. Demo: Session/Packet Extraction & Reconstruction
See examples/session_packet.py for a full demonstration:
- Extracts packets and sessions from a sample PCAP
- Converts to numpy arrays (full, per-layer, per-header)
- Reconstructs packets and sessions from arrays
- Generates and saves images
- Tested with following datasets:
Sample Outputs
Below are sample images generated from the extraction pipeline, demonstrating the different representations of a network session:
Session Array:
This image shows the entire session as a 2D grayscale array, where each row represents a packet and each column represents a byte (padded as needed). Useful for ML models that operate on raw session data.
Layer Arrays:
This image visualizes the extracted arrays for each protocol layer (e.g., Ethernet, IP, TCP) within the session. Each layer's bytes are shown separately, highlighting protocol structure.
Header Arrays:
This image displays only the header bytes for each protocol layer, excluding payloads. It is useful for focusing on protocol metadata and structure, which can be important for traffic analysis and intrusion detection.
Directory Structure
sfe/core/packet/packet.py– Packet class, array/header extraction, reconstructionsfe/core/session/session.py– Session class, aggregation, from_arraysfe/data/extractor.py– Main extraction pipeline, batch processingexamples/extraction.py– CLI entry point for batch extractionexamples/session_packet.py– Demo: extraction, array/image generation, reconstructionassets/sample_pcaps/– Sample PCAP/CSV pairs and mapping.jsondocs/assets/sample_images/– Example output imagestemp/my_output/– Output images, arrays, and CSVs
Mapping & Column Configuration
- Place a
mapping.jsonin your PCAP/CSV directory to map PCAP filenames to CSV label files. - The extractor dynamically reads CSV columns and applies them to the
ColumnMappingdataclass for flexible workflows.
Testing
Run all unit tests:
python -m unittest discover tests
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file session_feature_extractor-0.0.1.tar.gz.
File metadata
- Download URL: session_feature_extractor-0.0.1.tar.gz
- Upload date:
- Size: 40.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32a827a32e28380bc356861b8efbbf74bf79bf437295244abf1f6f454ae79d7a
|
|
| MD5 |
795677a8e5b419a1359be84e3ba9146c
|
|
| BLAKE2b-256 |
d7ef6176460ffc1384b02f2a41669cdc49f6ce4e239a1a419db47e11807b5fb7
|
File details
Details for the file session_feature_extractor-0.0.1-py3-none-any.whl.
File metadata
- Download URL: session_feature_extractor-0.0.1-py3-none-any.whl
- Upload date:
- Size: 45.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fee5b2e0c47a8dee42b7cf3f8fdb64129e3bb27122127d8684f9a5d0119c8d82
|
|
| MD5 |
71f432f44466dc7d2a6f5bd3b18bd16f
|
|
| BLAKE2b-256 |
4fbbedaf906f93152c39ead5832209ed9e24ea3675a2ae5091115bff4e30cce0
|