an open source TLS encrypted traffic feature extraction tool from pcaps
Project description
Pysharkfeat
Pysharkfeat is a TLS encrypted traffic feature extraction tool from pcaps written in Python by using Wireshark's command line tshark
.
Pysharkfeat is derived from an academic research on malicious encrypted traffic analysis. Compare with other feature extraction tools such as Flowmeter and Joy, Pysharkfeat is easier to setup and use while providing rich features.
Features
- Parse a single pcap or directory to generate meta and statistical features
- Export features in JSON files
- Support logging
Traffic features include:
- Meta: 5-tuple(src ip, src port, dest ip, dest port, timestamp), duration, stream index
- Statistical:
- Bidirectional packet len and inter-arrival-time sum/max/min/mean/std
- SPLT(Markov sequence of pkt len and time)
- Byte distribution, payload std and entropy.
- TLS: todo.
Full features can be found in feat.py
or feature JSON file.
Environment
- Language: Python3.8, 3.9
- Dependence: Wireshark
Installation
Install pysharkfeat from pip
pip3 install pysharkfeat
Install Wireshark(tshark)
- Windows/Mac: https://www.wireshark.org/#download
- Centos:
sudo yum install wireshark
- Ubuntu:
sudo apt-get install wireshark
Test tshark
tshark --version
For Windows, make sure tshark can be called by command line by adding tshark
to the environment path.
Use case
Pysharkfeat can be used for machine learning research and threat analysis.
There are several feature files in tests/output
generated from pcaps at Malware Traffic Analysis, and you can immediately start analyzing them.
Example
This code snippet can be found in tests/demo.py
.
from pysharkfeat.featextractor import FeatureExtractor
import json, os
# specify pcaps and output dir
pcap_dir = "./pcaps/2021-01-04-Emotet-infection-with-Trickbot-traffic.pcap"
output_dir="./output"
extractor = FeatureExtractor(pcap_path=pcap_dir, output_dir=output_dir)
summary = extractor.main_extract_pcaps_feat()
print(summary)
# read feature files
feat_file = os.path.join(output_dir, "2021-01-04-Emotet-infection-with-Trickbot-traffic.json")
f = open(feat_file)
stream_feats = json.load(f)
for feat in stream_feats :
print("%s, stream_index:%s, byte dist entropy:%s" % (feat["pcap_name"], feat["stream_index"], feat["bd_entropy"]))
# display stream index and byte distribution entropy features, and bd entropies are very close.
2021-01-04-Emotet-infection-with-Trickbot-traffic.pcap, stream_index:3, byte dist entropy:7.999464797314957
2021-01-04-Emotet-infection-with-Trickbot-traffic.pcap, stream_index:7, byte dist entropy:7.903172099500442
2021-01-04-Emotet-infection-with-Trickbot-traffic.pcap, stream_index:9, byte dist entropy:7.9876935373284805
...
Performance consideration
Time
Pysharkfeat is built on tshark, which may incur substantial overhead. The following table shows some test results on a Mac OSX(CPU i5, 16GB RAM).
pcap name | pcap size | num of TLS streams | time(sec) |
---|---|---|---|
2021-01-04-Emotet-infection-with-Trickbot-traffic.pcap | 5.4MB | 10 | 10.8 |
2021-01-05-PurpleFox-EK-and-post-infection-traffic.pcap | 9.5MB | 8 | 11.5 |
2021-01-15-Emotet-epoch-1-infection-traffic.pcap | 5.9MB | 40 | 38.2 |
2021-02-24-Qakbot-infection-with-spambot-traffic.pcap | 21.1MB | 94 | 213.9 |
Storage
The feature file of a single TLS stream has approximately 16KB. If a pcap has 100 TLS streams, the storage will be roughly 1.6MB.
Feedback
You are welcome to post a issue or feature request, or send email to the author zliucd66@gmail.com.
License
Pysharkfeat is open source and free to use under GPL V3 license. See LICENSE for more details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pysharkfeat-0.1.tar.gz
.
File metadata
- Download URL: pysharkfeat-0.1.tar.gz
- Upload date:
- Size: 10.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3a472afe7618e53574ea6c537142a6b25d849aeef15a56498d4ea85178a66a5f |
|
MD5 | 57478bc866a31be49e8b4f9772d5cb92 |
|
BLAKE2b-256 | 2a08a65284856d4ffe963021ffcd01f661438a26e5bd1d1f6a3064496c8af3bf |
File details
Details for the file pysharkfeat-0.1-py3-none-any.whl
.
File metadata
- Download URL: pysharkfeat-0.1-py3-none-any.whl
- Upload date:
- Size: 11.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | df6a97d95c37225d7e7316244e8fb0d97f5b11687aaa041720e7724ed35dda6a |
|
MD5 | 1bfbc45729a647cad63ef979ead29c0c |
|
BLAKE2b-256 | 8985ee65e25dd2bb312c6e0b041f73acdb86f2a276938055e348e4214ee1ec3f |