Skip to main content

Tool to Process Smart Search Results and Identify Top Senders

Project description

Proofpoint Sender Analyzer

This tool helps identify the top senders based on smart search outbound message exports or CSV data.

Requirements:

  • Python 3.9+

Installing the Package

You can install the tool using the following command directly from Github.

pip install git+https://github.com/pfptcommunity/senderstats.git

or can install the tool using pip.

# When testing on Ubuntu 24.04 the following will not work:
pip install senderstats

If you see an error similar to the following:

error: externally-managed-environment

× This environment is externally managed
╰─> To install Python packages system-wide, try apt install
    python3-xyz, where xyz is the package you are trying to
    install.

    If you wish to install a non-Debian-packaged Python package,
    create a virtual environment using python3 -m venv path/to/venv.
    Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make
    sure you have python3-full installed.

    If you wish to install a non-Debian packaged Python application,
    it may be easiest to use pipx install xyz, which will manage a
    virtual environment for you. Make sure you have pipx installed.

    See /usr/share/doc/python3.12/README.venv for more information.

note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
hint: See PEP 668 for the detailed specification.

You should use install pipx or you can configure your own virtual environment and use the command referenced above.

pipx install senderstats

Use Cases

Outbound message volumes and data transferred by:

  • Envelope sender
  • Header From:
  • Return-Path:
  • Envelope header: From:, MessageID Host, MessageID Domain (helpful to identify original sender)
  • Envelope sender and header From: for SPF alignment purposes
  • Random subject line sampling to help understand the type of traffic
  • Peak Hourly Volumes

Summarize message volume information:

  • Estimated application email traffic based on sender volume threshold:
    • Estimated application data
    • Estimated application messages
    • Estimated application average size
  • Total outbound data
    • Total outbound data
    • Total outbound messages
    • Total outbound average size
    • Total outbound peak hourly volume

Processing Behavior

The primary purpose of this tool is to identify sender message volumes and calculate data transfer rates for legitimate emails.

Input Requirements

  • Expected Fields: The input CSV should include at least the envelope sender and message size fields.
  • Exclusions: Messages will be excluded if:
    • The envelope sender is empty (common for bounce replies or calendar actions).
    • The message size is missing or not a valid number (typically rejects that can skew reporting).

Exclusion Rules

  1. Domain-Based Exclusions:

    • Messages from system domains such as ppops.net, pphosted.com, and knowledgefront.com are omitted by default to filter out monitoring messages.
    • To include these messages, use the --no-default-exclude-domains flag.
  2. IP-Based Exclusions:

    • For messages from 127.0.0.1 (e.g., system reports and digests on Proofpoint Protection Gateway), use the --exclude-ips flag to exclude them.
    • This option requires sender IP addresses to be included in the CSV.

Each exclusion step ensures the accuracy of volume and average message size reporting by filtering out unnecessary data.

Usage Options

usage: senderstats [-h] [--version] -i <file> [<file> ...] -o <xlsx> [--ip IP]
                   [--mfrom MFrom] [--hfrom HFrom] [--rcpts Rcpts]
                   [--rpath RPath] [--msgid MsgID] [--subject Subject]
                   [--size MsgSz] [--date Date] [--gen-hfrom] [--gen-rpath]
                   [--gen-alignment] [--gen-msgid] [--expand-recipients]
                   [--no-display-name] [--remove-prvs] [--decode-srs]
                   [--no-empty-hfrom] [--sample-subject]
                   [--exclude-ips <ip> [<ip> ...]]
                   [--exclude-domains <domain> [<domain> ...]]
                   [--restrict-domains <domain> [<domain> ...]]
                   [--exclude-senders <sender> [<sender> ...]]
                   [--date-format DateFmt] [--no-default-exclude-domains]

This tool helps identify the top senders based on smart search outbound
message exports.

Input / Output arguments (required):
  -i <file> [<file> ...], --input <file> [<file> ...]  Smart search files to
                                                       read.
  -o <xlsx>, --output <xlsx>                           Output file

Field mapping arguments (optional):
  --ip IP                                              CSV field of the IP
                                                       address. (default=Sende
                                                       r_IP_Address)
  --mfrom MFrom                                        CSV field of the
                                                       envelope sender
                                                       address.
                                                       (default=Sender)
  --hfrom HFrom                                        CSV field of the header
                                                       From: address.
                                                       (default=Header_From)
  --rcpts Rcpts                                        CSV field of the header
                                                       recipient addresses.
                                                       (default=Recipients)
  --rpath RPath                                        CSV field of the
                                                       Return-Path: address.
                                                       (default=Header_Return-
                                                       Path)
  --msgid MsgID                                        CSV field of the
                                                       message ID.
                                                       (default=Message_ID)
  --subject Subject                                    CSV field of the
                                                       Subject, only used if
                                                       --sample-subject is
                                                       specified.
                                                       (default=Subject)
  --size MsgSz                                         CSV field of message
                                                       size.
                                                       (default=Message_Size)
  --date Date                                          CSV field of message
                                                       date/time.
                                                       (default=Date)

Reporting control arguments (optional):
  --gen-hfrom                                          Generate report showing
                                                       the header From: data
                                                       for messages being
                                                       sent.
  --gen-rpath                                          Generate report showing
                                                       return path for
                                                       messages being sent.
  --gen-alignment                                      Generate report showing
                                                       envelope sender and
                                                       header From: alignment
  --gen-msgid                                          Generate report showing
                                                       parsed Message ID.
                                                       Helps determine the
                                                       sending system

Parsing behavior arguments (optional):
  --expand-recipients                                  Expand recipients
                                                       counts messages by
                                                       destination. E.g. 1
                                                       message going to 3
                                                       people, is 3 messages
                                                       sent.
  --no-display-name                                    Remove display and use
                                                       address only. Converts
                                                       'Display Name
                                                       <user@domain.com>' to
                                                       'user@domain.com'
  --remove-prvs                                        Remove return path
                                                       verification strings
                                                       e.g. prvs=tag=sender@do
                                                       main.com
  --decode-srs                                         Convert sender rewrite
                                                       scheme, forwardmailbox+
                                                       srs=hash=tt=domain.com=
                                                       user to user@domain.com
  --no-empty-hfrom                                     If the header From: is
                                                       empty the envelope
                                                       sender address is used
  --sample-subject                                     Enable probabilistic
                                                       random sampling of
                                                       subject lines found
                                                       during processing
  --exclude-ips <ip> [<ip> ...]                        Exclude ips from
                                                       processing.
  --exclude-domains <domain> [<domain> ...]            Exclude domains from
                                                       processing.
  --restrict-domains <domain> [<domain> ...]           Constrain domains for
                                                       processing.
  --exclude-senders <sender> [<sender> ...]            Exclude senders from
                                                       processing.
  --date-format DateFmt                                Date format used to
                                                       parse the timestamps. (
                                                       default=%Y-%m-%dT%H:%M:
                                                       %S.%f%z)

Extended processing controls (optional):
  --no-default-exclude-domains                         Will not include the
                                                       default Proofpoint
                                                       excluded domains.

Usage:
  -h, --help                                           Show this help message
                                                       and exit
  --version                                            Show the program's
                                                       version and exit

Using the Tool with Proofpoint Smart Search

Export all outbound message traffic as a smart search CSV. You may need to export multiple CSVs if the data per time window exceeds 1M records. The tool can ingest multiple CSVs files at once.

smart_search_outbound

Once the files are downlaoded to a target folder, you can run the following command with the path to the files you downloaded and specify a wildard.

The following example is the most basic usage:

# Windows
senderstats -i C:\path\to\downloaded\files\smart_search_results_cluster_hosted_2024_03_04_*.csv -o C:\path\to\output\file\my_cluster_hosted.xlsx
# Linux
senderstats -i /path/to/downloaded/files/smart_search_results_cluster_hosted_2024_03_04_*.csv -o /path/to/output/file/my_cluster_hosted.xlsx

For a more comprehensive report use the following command:

# Windows
senderstats -i C:\path\to\downloaded\files\smart_search_results_cluster_hosted_2024_03_04_*.csv -o C:\path\to\output\file\my_cluster_hosted.xlsx --remove-prvs --decode-srs --gen-hfrom --gen-alignment --gen-msgid --sample-subject
# Linux
senderstats -i /path/to/downloaded/files/smart_search_results_cluster_hosted_2024_03_04_*.csv -o /path/to/output/file/my_cluster_hosted.xlsx --remove-prvs --decode-srs --gen-hfrom --gen-alignment --gen-msgid --sample-subject

Expanding recipients counts messages by destination via --expand-recipients:

This is useful if you need to determine how many messages were sent to a destination, as a single message can be addressed to multiple recipients.

# Windows
senderstats -i C:\path\to\downloaded\files\smart_search_results_cluster_hosted_2024_03_04_*.csv -o C:\path\to\output\file\my_cluster_hosted.xlsx --remove-prvs --decode-srs --gen-hfrom --gen-alignment --gen-msgid --sample-subject --expand-recipients
# Linux
senderstats -i /path/to/downloaded/files/smart_search_results_cluster_hosted_2024_03_04_*.csv -o /path/to/output/file/my_cluster_hosted.xlsx --remove-prvs --decode-srs --gen-hfrom --gen-alignment --gen-msgid --sample-subject --expand-recipients

Sample Output

The execution results should look similar to the following depending the options you select.

Files to be processed:
C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173552.csv
C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173855.csv
C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173656.csv
C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173754.csv
C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173834.csv

Domains excluded from processing:
knowledgefront.com
pphosted.com
ppops.net

Processing:  C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173552.csv
Processing:  C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173855.csv
Processing:  C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173656.csv
Processing:  C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173754.csv
Processing:  C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173834.csv

Messages excluded by empty senders: 1523573
Messages excluded by domain: 484664
Messages excluded by sender: 0
Messages excluded by constraint: 0

Generating report, please wait.

Please see report: C:\Users\ljerabek\Downloads\my_cluster_hosted.xlsx

Sample Summary Statistics

image

Sample Details (Sender + From by Volume):

image

Sample Details (Message ID) Inferencing:

image

Sample Details (Hourly Metrics):

image

Current Class Heirarchy

senderstats

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

senderstats-2.0.6.tar.gz (27.9 kB view details)

Uploaded Source

Built Distribution

senderstats-2.0.6-py3-none-any.whl (38.4 kB view details)

Uploaded Python 3

File details

Details for the file senderstats-2.0.6.tar.gz.

File metadata

  • Download URL: senderstats-2.0.6.tar.gz
  • Upload date:
  • Size: 27.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for senderstats-2.0.6.tar.gz
Algorithm Hash digest
SHA256 8853bb4654d8a40d0d54c2cbe4e8a1ad2fe818de2ab51486ec03db3dc9d9cf12
MD5 aefbf268b4f04febbaedfaef1bbd681d
BLAKE2b-256 8596bda4b31821611dddb0cb25002040fee8c613968043b04489e00bc31c6fe1

See more details on using hashes here.

File details

Details for the file senderstats-2.0.6-py3-none-any.whl.

File metadata

  • Download URL: senderstats-2.0.6-py3-none-any.whl
  • Upload date:
  • Size: 38.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for senderstats-2.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 93a1017846bd0a31c36e84c804cf4e60aa1fe084ccf49309739824c8df0dd12e
MD5 8b7fc5c8c42e5f429b012a3b74ab5999
BLAKE2b-256 0d800ded928498b4281145498c589785498cb994719ae92abc0ca9472d2cb79c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page