Skip to main content

an open source TLS encrypted traffic feature extraction tool from pcaps

Project description

Pysharkfeat

Pysharkfeat is a TLS encrypted traffic feature extraction tool from pcaps written in Python by using Wireshark's command line tshark.

Pysharkfeat is derived from an academic research on malicious encrypted traffic analysis. Compare with other feature extraction tools such as Flowmeter and Joy, Pysharkfeat is easier to setup and use while providing rich features.

Features

  • Parse a single pcap or directory to generate meta and statistical features
  • Export features in JSON files
  • Support logging

Traffic features include:

  • Meta: 5-tuple(src ip, src port, dest ip, dest port, timestamp), duration, stream index
  • Statistical:
    • Bidirectional packet len and inter-arrival-time sum/max/min/mean/std
    • SPLT(Markov sequence of pkt len and time)
    • Byte distribution, payload std and entropy.
  • TLS: todo.

Full features can be found in feat.py or feature JSON file.

Environment

  • Language: Python3.8, 3.9
  • Dependence: Wireshark

Installation

Install pysharkfeat from pip

pip3 install pysharkfeat

Install Wireshark(tshark)

Test tshark

tshark --version

For Windows, make sure tshark can be called by command line by adding tshark to the environment path.

Use case

Pysharkfeat can be used for machine learning research and threat analysis.

There are several feature files in tests/output generated from pcaps at Malware Traffic Analysis, and you can immediately start analyzing them.

Example

This code snippet can be found in tests/demo.py.

from pysharkfeat.featextractor import FeatureExtractor
import json, os

# specify pcaps and output dir
pcap_dir = "./pcaps/2021-01-04-Emotet-infection-with-Trickbot-traffic.pcap"
output_dir="./output"

extractor = FeatureExtractor(pcap_path=pcap_dir, output_dir=output_dir)
summary = extractor.main_extract_pcaps_feat()

print(summary)

# read feature files
feat_file = os.path.join(output_dir, "2021-01-04-Emotet-infection-with-Trickbot-traffic.json")
f = open(feat_file)
stream_feats = json.load(f)
for feat in stream_feats :
    print("%s,  stream_index:%s,  byte dist entropy:%s" % (feat["pcap_name"], feat["stream_index"], feat["bd_entropy"]))


# display stream index and byte distribution entropy features, and bd entropies are very close.

    2021-01-04-Emotet-infection-with-Trickbot-traffic.pcap,  stream_index:3,  byte dist entropy:7.999464797314957
    2021-01-04-Emotet-infection-with-Trickbot-traffic.pcap,  stream_index:7,  byte dist entropy:7.903172099500442
    2021-01-04-Emotet-infection-with-Trickbot-traffic.pcap,  stream_index:9,  byte dist entropy:7.9876935373284805
    ...

Performance consideration

Time

Pysharkfeat is built on tshark, which may incur substantial overhead. The following table shows some test results on a Mac OSX(CPU i5, 16GB RAM).

pcap name pcap size num of TLS streams time(sec)
2021-01-04-Emotet-infection-with-Trickbot-traffic.pcap 5.4MB 10 10.8
2021-01-05-PurpleFox-EK-and-post-infection-traffic.pcap 9.5MB 8 11.5
2021-01-15-Emotet-epoch-1-infection-traffic.pcap 5.9MB 40 38.2
2021-02-24-Qakbot-infection-with-spambot-traffic.pcap 21.1MB 94 213.9

Storage

The feature file of a single TLS stream has approximately 16KB. If a pcap has 100 TLS streams, the storage will be roughly 1.6MB.

Feedback

You are welcome to post a issue or feature request, or send email to the author zliucd66@gmail.com.

License

Pysharkfeat is open source and free to use under GPL V3 license. See LICENSE for more details.

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysharkfeat-0.1.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

pysharkfeat-0.1-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file pysharkfeat-0.1.tar.gz.

File metadata

  • Download URL: pysharkfeat-0.1.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.2

File hashes

Hashes for pysharkfeat-0.1.tar.gz
Algorithm Hash digest
SHA256 3a472afe7618e53574ea6c537142a6b25d849aeef15a56498d4ea85178a66a5f
MD5 57478bc866a31be49e8b4f9772d5cb92
BLAKE2b-256 2a08a65284856d4ffe963021ffcd01f661438a26e5bd1d1f6a3064496c8af3bf

See more details on using hashes here.

File details

Details for the file pysharkfeat-0.1-py3-none-any.whl.

File metadata

  • Download URL: pysharkfeat-0.1-py3-none-any.whl
  • Upload date:
  • Size: 11.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.2

File hashes

Hashes for pysharkfeat-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 df6a97d95c37225d7e7316244e8fb0d97f5b11687aaa041720e7724ed35dda6a
MD5 1bfbc45729a647cad63ef979ead29c0c
BLAKE2b-256 8985ee65e25dd2bb312c6e0b041f73acdb86f2a276938055e348e4214ee1ec3f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page