Skip to main content
Join the official 2019 Python Developers SurveyStart the survey!

Python tool for network (TLS, etc.) fingerprinting

Project description

pmercury

pmercury provides a Python reference implementation for network fingerprinting and advanced analysis techniques. Specifically, the code can generate a TLS fingerprint given a network interface or packet capture file. The code can then leverage the provided fingerprint database to perform process identification and operating system identification (soon).

There are four distinct (but related) components:

  • protocols/tls.py - A Python library providing an API for fingerprint generation and inferencing
  • pmercury - A wrapper around protocols/tls.py that can process pcaps or listen to a network interface
  • ../src/python-inference/* - A Cython port of protocols/tls.py that can be called from C++14 or higher code
  • ../resources/fingerprint_db.json.gz - The star of the show; a detailed database associating billions of network and endpoint observations

Installation

pmercury depends on libpcap-dev:

sudo apt-get install libpcap-dev

To install pmercury with pip:

pip3 install pmercury

pmercury

pmercury is designed to highlight the functionality of tls.py and to provide a simple interface into the fingerprint database.

Dependencies

pmercury requires Python 3.6 along with the following packages:

pip3 install ujson
pip3 install pyasn
pip3 install hpack
pip3 install pypcap
pip3 install cryptography

pip3 install ujson pyasn hpack pypcap cryptography

pip3 can be installed with 'sudo apt install python3-pip’ on debian/ubuntu, or the equivalent command for your OS.

Usage

./pmercury [OPTIONS] [INPUT] [OUTPUT]

INPUT
   [-c or --capture] capture_interface          # live packet capture
   [-r or --read] read_file_name                # read packets from file
   [-d or --fp_db] fingerprint_database         # fingerprint database file (if you are not using the default)

OUTPUT
   [-f or --fingerprint] fingerprint_file_name  # write fingerprints to file (stdout is the default)

OPTIONS
   [-l or --lookup]                             # return database entry for a double quoted fingerprint string
   [-n or --num-procs]                          # return the top-n most probable processes
   [-s or --sslkeylogfile]                      # filename of sslkeylog output for decryption

FLAGS
   [-a or --analysis]                           # perform process identification
   [-w or --human-readable]                     # return human readable fingerprint string
   [-g or --group-flows]                        # aggregate packet-based fingerprints to flow-based
   [-e or --experimental]                       # turns on all experimental features
   [-h or --help]                               # help text

The input can be either a list of packet capture files or a network interface.

A default fingerprint database is supplied in the resources directory. Updating the repository is currently the only way to get the latest generated database. If you specify your own database, there may be some problems if the formatting is not as expected. Please raise an issue if this is an important use case for you.

The -a switch tells pmercury to perform inferencing on each observed ClientHello packet. The results are comprised of a 4-tuple specifying the name of the process (process) and a score representing the confidence of the algorithm in selecting that process (score).

Examples

Looking up a fingerprint string in the database:

~/ $: ./pmercury -l "(0301)(c014c01300390035002fc00ac00900380032000a001300050004)((0000)(000a0006000400170018)(000b00020100)(0017)(ff01))" | jq .
{
  "str_repr": "(0301)(c014c01300390035002fc00ac00900380032000a001300050004)((0000)(000a0006000400170018)(000b00020100)(0017)(ff01))",
  "first_seen": "2019-07-22",
  "last_seen": "2019-07-25",
  "max_implementation_date": "2015-09",
  "min_implementation_date": "1999-01",
  "total_count": 2,
  "process_info": [
    ...

Performing process identification:

~/ $: ./pmercury -r ../test/data/top_100_fingerprints.pcap -a | jq .
{
  "src_ip": "10.0.2.15",
  "dst_ip": "172.217.7.228",
  "src_port": 37582,
  "dst_port": 443,
  "protocol": 6,
  "server_name": "www.google.com",
  "timestamp": "2019-08-06 13:45:51.157055",
  "fingerprints": {
    "tls": "(0303)(c02c...)((0000)...)"
  }
  "analysis": {
    "process": "Microsoft Office (WinNT)",
    "score": 0.9811716395
  }
}
{
  ...

Experimental Features

TLS client fingerprint extraction and process identification is relatively mature. The following are additional pmercury features that either have less thought put into their development, undergone less testing, and/or do not have associated fingerprint databases:

  • TCP fingerprint extraction - Currently only based on TCP options; no fingerprint database.
  • TLS server fingerprint extraction - Currently only based on ServerHello; no fingerprint database.
  • HTTP/1.x client fingerprint extraction - Currently extracts all headers from the HTTP/1.x request; no fingerprint database.
  • HTTP/1.x server fingerprint extraction - Currently extracts all headers from the HTTP/1.x response; no fingerprint database.
  • TLS decryption and fingerprint extraction - Currently decrypts TLS sessions when supplied a file in NSS Key Log Format and extracts the internal HTTP/1.x and HTTP/2 requests and responses.
  • TLS server certificate extraction - Currently extracts metadata from the first certificate; no associated auxiliary data.
  • SSH server/client fingerprint extraction - Currently extracts metadata from the initial two messages from an SSH client or server.

These features can be turned on by invoking the -e option.

To perform decryption with HTTP/2 data extraction:

~/ $: ./pmercury -r ../test/data/test_decrypt.pcap -s ../test/data/sslkeylogfile.log -w -e
{
  "src_ip": "10.0.2.15",
  "dst_ip": "216.58.194.163",
  "src_port": 46362,
  "dst_port": 443,
  "protocol": 6,
  "event_start": "2019-09-04 16:03:52.933118",
  "fingerprints": {
    "tls_decrypt_h2": "(3a6d6574686f643a20474554)..."
  },
  "tls_decrypt_h2": [
    {":method": "GET"},
    {":authority": "clientservices.googleapis.com"},
    {":scheme": "https"},
    {":path": "/chrome-variations/seed?osname=linux&channel=beta&milestone=76"},
    {"if-none-match": "314f8267d4516ba24ac54575521acebdbe10d2ec"},
    {"a-im": "x-bm,gzip"},
    {"sec-fetch-site": "none"},
    {"user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.80 Safari/537.36"},
    {"accept-encoding": "gzip, deflate, br"}
  ]
}

protocols/tls.py

tls.py is designed to be a relatively self-contained Python library that provides a rich set of features w.r.t. TLS fingerprinting. The goal is for this library to be as easy as possible to integrate into any Python3 program. Please raise issues if it there are any awkward integration points.

Dependencies

tls.py requires Python 3.6 along with the following packages:

sudo pip3 install ujson
sudo pip3 install numpy
sudo pip3 install pyasn

Useful Functions

The TLS class can be instantiated with the following:

tls = TLS(database)

tls.py has the ability to extract a fingerprint string from the TCP data associated with a ClientHello. This function will return the fingerprint string and server_name. If the exact fingerprint string was not present in the database, this function will attempt to find and return an approximate match:

def fingerprint(self, data):
    ...
    return protocol_type, fp_str_, approx_str_, server_name

Process identification is also handled, and is implemented with the help of lru_cache to improve performance. The function takes a fingerprint string, approximate fingerprint string which can be None, SNI, destination address, destination port, and an optional integer specifying the number of potential processes to return. This function return a dictionary with the inferred process, score, malware indicator, probability of malware, and an optional list of probable processes (probable_processes):

def proc_identify(self, fp_str_, context_, dest_addr, dest_port, list_procs=5):
    server_name = None
    if context_ != None:
        for x_ in context_:
            if x_['name'] == 'server_name':
                server_name = x_['data']
                break
    result = self.identify(fp_str_, server_name, dest_addr, dest_port, list_procs)

    return result

Cython C++ Inferface

Fingerprint Database

The fingerprint database is a gzipped, 1 JSON object per line file. Each fingerprint contains the following metadata:

{
  "str_repr": "(0303)(003d...)((0000)...)", // String representation of the fingerprint
  "first_seen": "2018-06-05",               // Date the fingerprint was first observed
  "last_seen": "2019-08-10",                // Date the fingerprint was last observed
  "max_implementation_date": "2008-08",     // Maximum RFC date that is associated with parameters in the fingerprint
  "min_implementation_date": "2002-06",     // Minimum RFC date that is associated with parameters in the fingerprint
  "total_count": 123,                       // Total number of sessions observed using this fingerprint
  "process_info": [                         // Top-10 most common processes using this fingerprint
    ...
  ]
}

Each process object contains some metadata along with objects that represent the destinations the process was observed contacting. The destination information is represented as equivalence classes, e.g., IP addresses are generalized to autonomous systems. For this open source version, the destination information is computed from the top-100 most popular destinations per process.

{
  "process": "nmap.exe",         // Name of the process
  "sha256": "F78 ... 99B",       // SHA-256 hash of the process executable
  "count": 10,                   // Total number of sessions observed using this process/fingerprint pair
  "classes_ip_as": {             // Autonomous system equivalence class for IP addresses
    "109:Cisco_Systems": 6,
    ...
  }
  "classes_hostname_tlds": {     // Top level domain equivalence class for server name indication
    "com": 9,
    ...
  }
  "classes_hostname_domains": {  // Domain name equivalence class for server name indication
    "cisco.com": 6,
    ...
  }
  "classes_port_applications": { // Port application equivalence class for port number
    "https": 9,
    ...
  }
  "os_info": {                   // Top-5 most common operating systems observed with this process/fingerprint pair
    "(WinNT)...(10.0.17134)": 8, // (OS)(OS Edition)(OS Version) -> number of sessions
    ...
  }
}

Acknowledgments

This product includes GeoLite2 data created by MaxMind, available from https://www.maxmind.com.

We make use of Mozilla's Public Suffix List which is subject to the terms of the Mozilla Public License, v. 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for pmercury, version 0.2.1.3
Filename, size File type Python version Upload date Hashes
Filename, size pmercury-0.2.1.3-py3-none-any.whl (7.7 MB) File type Wheel Python version py3 Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page