Skip to main content

A Flexible Network Data Analysis Framework

Project description

nfstream: a flexible network data analysis framework

nfstream is a Python package providing fast, flexible, and expressive data structures designed to make working with online or offline network data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world network data analysis in Python. Additionally, it has the broader goal of becoming a common network data processing framework for researchers providing data reproducibility across experiments.

Live Notebook live notebook
Project Website website
Discussion Channel Gitter
Latest Release latest release
Supported Versions python3
Project License License
Build Status Github WorkFlows Travis CI
Code Quality Quality
Code Coverage Coverage

Main Features

  • Performance: nfstream is designed to be fast (x10 faster with PyPy support) with a small CPU and memory footprint.
  • Layer-7 visibility: nfstream deep packet inspection engine is based on nDPI. It allows nfstream to perform reliable encrypted applications identification and metadata extraction (e.g. TLS, QUIC, TOR, HTTP, SSH, DNS, etc.).
  • Flexibility: add a flow feature in 2 lines as an NFPlugin.
  • Machine Learning oriented: add your trained model as an NFPlugin.

How to use it?

  • Dealing with a big pcap file and just want to aggregate it as network flows? nfstream make this path easier in few lines:
from nfstream import NFStreamer
my_awesome_streamer = NFStreamer(source="facebook.pcap", # or network interface (source="eth0")
                                 snaplen=65535,
                                 idle_timeout=30,
                                 active_timeout=300,
                                 plugins=(),
                                 dissect=True,
                                 max_tcp_dissections=10,
                                 max_udp_dissections=16,
                                 statistics=False,
                                 account_ip_padding_size=False,
                                 enable_guess=True,
                                 decode_tunnels=True,
                                 bpf_filter=None,
                                 promisc=True
)

for flow in my_awesome_streamer:
    print(flow)  # print it.
    print(flow.to_namedtuple()) # convert it to a namedtuple.
    print(flow.to_json()) # convert it to json.
    print(flow.keys()) # get flow keys.
    print(flow.values()) # get flow values.
NFEntry(id=0,
        bidirectional_first_seen_ms=1472393122365,
        bidirectional_last_seen_ms=1472393123665,
        src2dst_first_seen_ms=1472393122365,
        src2dst_last_seen_ms=1472393123408,
        dst2src_first_seen_ms=1472393122668,
        dst2src_last_seen_ms=1472393123665,
        src_ip='192.168.43.18',
        src_ip_type=1,
        dst_ip='66.220.156.68',
        dst_ip_type=0,
        version=4,
        src_port=52066,
        dst_port=443,
        protocol=6,
        vlan_id=4,
        bidirectional_packets=19,
        bidirectional_raw_bytes=5745,
        bidirectional_ip_bytes=5479,
        bidirectional_duration_ms=1300,
        src2dst_packets=9,
        src2dst_raw_bytes=1345,
        src2dst_ip_bytes=1219,
        src2dst_duration_ms=1300,
        dst2src_packets=10,
        dst2src_raw_bytes=4400,
        dst2src_ip_bytes=4260,
        dst2src_duration_ms=997,
        expiration_id=0,
        master_protocol=91,
        app_protocol=119,
        application_name='TLS.Facebook',
        category_name='SocialNetwork',
        client_info='facebook.com',
        server_info='*.facebook.com,*.facebook.net,*.fb.com,\
                     *.fbcdn.net,*.fbsbx.com,*.m.facebook.com,\
                     *.messenger.com,*.xx.fbcdn.net,*.xy.fbcdn.net,\
                     *.xz.fbcdn.net,facebook.com,fb.com,messenger.com',
        j3a_client='bfcc1a3891601edb4f137ab7ab25b840',
        j3a_server='2d1eb5817ece335c24904f516ad5da12')
from nfstream import NFStreamer
my_awesome_streamer = NFStreamer(source="facebook.pcap", statistics=True)
for flow in my_awesome_streamer:
    print(flow)
NFEntry(id=0,      
        bidirectional_first_seen_ms=1472393122365,
        bidirectional_last_seen_ms=1472393123665,
        src2dst_first_seen_ms=1472393122365,
        src2dst_last_seen_ms=1472393123408,
        dst2src_first_seen_ms=1472393122668,
        dst2src_last_seen_ms=1472393123665,
        src_ip='192.168.43.18',
        src_ip_type=1,
        dst_ip='66.220.156.68',
        dst_ip_type=0,
        version=4,
        src_port=52066,
        dst_port=443,
        protocol=6,
        vlan_id=4,
        bidirectional_packets=19,
        bidirectional_raw_bytes=5745,
        bidirectional_ip_bytes=5479,
        bidirectional_duration_ms=1300,
        src2dst_packets=9,
        src2dst_raw_bytes=1345,
        src2dst_ip_bytes=1219,
        src2dst_duration_ms=1300,
        dst2src_packets=10,
        dst2src_raw_bytes=4400,
        dst2src_ip_bytes=4260,
        dst2src_duration_ms=997,
        expiration_id=0,
        bidirectional_min_raw_ps=66,
        bidirectional_mean_raw_ps=302.36842105263156,
        bidirectional_stdev_raw_ps=425.53315715259754,
        bidirectional_max_raw_ps=1454,
        src2dst_min_raw_ps=66,
        src2dst_mean_raw_ps=149.44444444444446,
        src2dst_stdev_raw_ps=132.20354676701294,
        src2dst_max_raw_ps=449,
        dst2src_min_raw_ps=66,
        dst2src_mean_raw_ps=440.0,
        dst2src_stdev_raw_ps=549.7164925870628,
        dst2src_max_raw_ps=1454,
        bidirectional_min_ip_ps=52,
        bidirectional_mean_ip_ps=288.36842105263156,
        bidirectional_stdev_ip_ps=425.53315715259754,
        bidirectional_max_ip_ps=1440,
        src2dst_min_ip_ps=52,
        src2dst_mean_ip_ps=135.44444444444446,
        src2dst_stdev_ip_ps=132.20354676701294,
        src2dst_max_ip_ps=435,
        dst2src_min_ip_ps=52,
        dst2src_mean_ip_ps=426.0,
        dst2src_stdev_ip_ps=549.7164925870628,
        dst2src_max_ip_ps=1440,
        bidirectional_min_piat_ms=0,
        bidirectional_mean_piat_ms=72.22222222222223,
        bidirectional_stdev_piat_ms=137.34994188549086,
        bidirectional_max_piat_ms=398,
        src2dst_min_piat_ms=0,
        src2dst_mean_piat_ms=130.375,
        src2dst_stdev_piat_ms=179.72036811192467,
        src2dst_max_piat_ms=415,
        dst2src_min_piat_ms=0,
        dst2src_mean_piat_ms=110.77777777777777,
        dst2src_stdev_piat_ms=169.51458475436397,
        dst2src_max_piat_ms=1,
        bidirectional_syn_packets=2,
        bidirectional_cwr_packets=0,
        bidirectional_ece_packets=0,
        bidirectional_urg_packets=0,
        bidirectional_ack_packets=18,
        bidirectional_psh_packets=9,
        bidirectional_rst_packets=0,
        bidirectional_fin_packets=0,
        src2dst_syn_packets=1,
        src2dst_cwr_packets=0,
        src2dst_ece_packets=0,
        src2dst_urg_packets=0,
        src2dst_ack_packets=8,
        src2dst_psh_packets=4,
        src2dst_rst_packets=0,
        src2dst_fin_packets=0,
        dst2src_syn_packets=1,
        dst2src_cwr_packets=0,
        dst2src_ece_packets=0,
        dst2src_urg_packets=0,
        dst2src_ack_packets=10,
        dst2src_psh_packets=5,
        dst2src_rst_packets=0,
        dst2src_fin_packets=0,
        master_protocol=91,
        app_protocol=119,
        application_name='TLS.Facebook',
        category_name='SocialNetwork',
        client_info='facebook.com',
        server_info='*.facebook.com,*.facebook.net,*.fb.com,\
                     *.fbcdn.net,*.fbsbx.com,*.m.facebook.com,\
                     *.messenger.com,*.xx.fbcdn.net,*.xy.fbcdn.net,\
                     *.xz.fbcdn.net,facebook.com,fb.com,messenger.com',
        j3a_client='bfcc1a3891601edb4f137ab7ab25b840',
        j3a_server='2d1eb5817ece335c24904f516ad5da12')
  • From pcap to Pandas DataFrame?
flows_count = NFStreamer(source='devil.pcap').to_pandas(ip_anonymization=False)
my_dataframe.head(5)
  • From pcap to csv file?
flows_rows_count = NFStreamer(source='devil.pcap').to_csv(path="devil.pcap.csv",
                                                          sep="|",
                                                          ip_anonymization=False)
  • Didn't find a specific flow feature? add a plugin to nfstream in few lines:
from nfstream import NFPlugin

class packet_with_666_size(NFPlugin):
    def on_init(self, pkt): # flow creation with the first packet
        if pkt.raw_size == 666:
            return 1
        else:
            return 0

    def on_update(self, pkt, flow): # flow update with each packet belonging to the flow
        if pkt.raw_size == 666:
            flow.packet_with_666_size += 1

streamer_awesome = NFStreamer(source='devil.pcap', plugins=[packet_with_666_size()])
for flow in streamer_awesome:
    print(flow.packet_with_666_size) # see your dynamically created metric in generated flows

Run your Machine Learning models

In the following, we want to run an early classification of flows based on a trained machine learning model than takes as features the 3 first packets size of a flow.

Computing required features

from nfstream import NFPlugin

class feat_1(NFPlugin):
    def on_init(self, obs):
        entry.feat_1 = obs.raw_size

class feat_2(NFPlugin):
    def on_update(self, obs, entry):
        if entry.bidirectional_packets == 2:
            entry.feat_2 = obs.raw_size

class feat_3(NFPlugin):
    def on_update(self, obs, entry):
        if entry.bidirectional_packets == 3:
            entry.feat_3 = obs.raw_size

Trained model prediction

class model_prediction(NFPlugin):
    def on_update(self, obs, entry):
        if entry.bidirectional_packets == 3:
            entry.model_prediction = self.user_data.predict_proba([entry.feat_1,
                                                                   entry.feat_2,
                                                                   entry.feat_3])
            # optionally we can trigger NFStreamer to immediately expires the flow
            # entry.expiration_id = -1

Start your ML powered streamer

my_model = function_to_load_your_model() # or whatever
ml_streamer = NFStreamer(source='devil.pcap',
                         plugins=[feat_1(volatile=True),
                                  feat_2(volatile=True),
                                  feat_3(volatile=True),
                                  model_prediction(user_data=my_model)
                                  ])
for flow in ml_streamer:
     print(flow.model_prediction) # now you will see your trained model prediction.

Installation

Using pip

Binary installers for the latest released version are available:

python3 -m pip install nfstream

Build from sources

If you want to build nfstream from sources on your local machine:

linux Linux

sudo apt-get install autoconf automake libtool pkg-config libpcap-dev
sudo apt-get install libusb-1.0-0-dev libdbus-glib-1-dev libbluetooth-dev libnl-genl-3-dev flex bison
git clone https://github.com/aouinizied/nfstream.git
cd nfstream
python3 -m pip install -r requirements.txt
python3 setup.py bdist_wheel

osx MacOS

brew install autoconf automake libtool pkg-config
git clone https://github.com/aouinizied/nfstream.git
cd nfstream
python3 -m pip install -r requirements.txt
python3 setup.py bdist_wheel

Contributing

Please read Contributing for details on our code of conduct, and the process for submitting pull requests to us.

Authors

Zied Aouini created nfstream and these fine people have contributed.

Ethics

nfstream is intended for network data research and forensics. Researchers and network data scientists can use these framework to build reliable datasets, train and evaluate network applied machine learning models. As with any packet monitoring tool, nfstream could potentially be misused. Do not run it on any network of which you are not the owner or the administrator.

License

This project is licensed under the GPLv3 License - see the License file for details

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

nfstream-5.1.4-cp38-cp38-manylinux2014_aarch64.whl (1.3 MB view details)

Uploaded CPython 3.8

nfstream-5.1.4-cp37-cp37m-manylinux2014_aarch64.whl (1.3 MB view details)

Uploaded CPython 3.7m

nfstream-5.1.4-cp36-cp36m-manylinux2014_aarch64.whl (1.3 MB view details)

Uploaded CPython 3.6m

File details

Details for the file nfstream-5.1.4-cp38-cp38-manylinux2014_aarch64.whl.

File metadata

  • Download URL: nfstream-5.1.4-cp38-cp38-manylinux2014_aarch64.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.0

File hashes

Hashes for nfstream-5.1.4-cp38-cp38-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 633295f664fd1d06f4f82f6dedb2b63b58e44b284940250b0dd634038abddfc6
MD5 9189f5c51929d8faa454cf1b4bfd1726
BLAKE2b-256 012795171e65562b33353fa0f955384b3cedffa2752a37eaa37efdd0bf664224

See more details on using hashes here.

File details

Details for the file nfstream-5.1.4-cp37-cp37m-manylinux2014_aarch64.whl.

File metadata

  • Download URL: nfstream-5.1.4-cp37-cp37m-manylinux2014_aarch64.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.5

File hashes

Hashes for nfstream-5.1.4-cp37-cp37m-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 20dc8b7daee4c4cf5cbb178b4207083493089958c6a9eca384c04c6be35eb6db
MD5 37bb520c405d80a2c5fa9266d2024434
BLAKE2b-256 44e1bc3c359edb725f73a7ccac46c9f1be002d0e26fb06ccae032cdd98fb5864

See more details on using hashes here.

File details

Details for the file nfstream-5.1.4-cp36-cp36m-manylinux2014_aarch64.whl.

File metadata

  • Download URL: nfstream-5.1.4-cp36-cp36m-manylinux2014_aarch64.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.6.9

File hashes

Hashes for nfstream-5.1.4-cp36-cp36m-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 a7ec45bf5b6bf06b87bc95e9b0371918543be983136e32373559f72c8f210763
MD5 ac9a2de52e0b31b2cbd67ef0c2583249
BLAKE2b-256 2f5eb47751403b207957234996e4754bb199c0252d9cb0e712b1c9fbdfcd34a6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page