Tool to Process Smart Search Results and Identify Top Senders
Project description
Proofpoint Sender Analyzer
This tool helps identify the top senders based on smart search outbound message exports or CSV data.
Requirements:
- Python 3.9+
Installing the Package
You can install the tool using the following command directly from Github.
pip install git+https://github.com/pfptcommunity/senderstats.git
or can install the tool using pip.
# When testing on Ubuntu 24.04 the following will not work:
pip install senderstats
If you see an error similar to the following:
error: externally-managed-environment
× This environment is externally managed
╰─> To install Python packages system-wide, try apt install
python3-xyz, where xyz is the package you are trying to
install.
If you wish to install a non-Debian-packaged Python package,
create a virtual environment using python3 -m venv path/to/venv.
Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make
sure you have python3-full installed.
If you wish to install a non-Debian packaged Python application,
it may be easiest to use pipx install xyz, which will manage a
virtual environment for you. Make sure you have pipx installed.
See /usr/share/doc/python3.12/README.venv for more information.
note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
hint: See PEP 668 for the detailed specification.
You should use install pipx or you can configure your own virtual environment and use the command referenced above.
pipx install senderstats
Use Cases
Outbound message volumes and data transferred by:
- Envelope sender
- Header From:
- Return-Path:
- Envelope header: From:, MessageID Host, MessageID Domain (helpful to identify original sender)
- Envelope sender and header From: for SPF alignment purposes
- Random subject line sampling to help understand the type of traffic
- Peak Hourly Volumes
Summarize message volume information:
- Estimated application email traffic based on sender volume threshold:
- Estimated application data
- Estimated application messages
- Estimated application average size
- Total outbound data
- Total outbound data
- Total outbound messages
- Total outbound average size
- Total outbound peak hourly volume
Processing Behavior
The primary purpose of this tool is to identify sender message volumes and calculate data transfer rates for legitimate emails.
Input Requirements
- Expected Fields: The input CSV should include at least the envelope sender and message size fields.
- Exclusions: Messages will be excluded if:
- The envelope sender is empty (common for bounce replies or calendar actions).
- The message size is missing or not a valid number (typically rejects that can skew reporting).
Exclusion Rules
-
Domain-Based Exclusions:
- Messages from system domains such as
ppops.net
,pphosted.com
, andknowledgefront.com
are omitted by default to filter out monitoring messages. - To include these messages, use the
--no-default-exclude-domains
flag.
- Messages from system domains such as
-
IP-Based Exclusions:
- For messages from
127.0.0.1
(e.g., system reports and digests on Proofpoint Protection Gateway), use the--exclude-ips
flag to exclude them. - This option requires sender IP addresses to be included in the CSV.
- For messages from
Each exclusion step ensures the accuracy of volume and average message size reporting by filtering out unnecessary data.
Usage Options
usage: senderstats [-h] [--version] -i <file> [<file> ...] -o <xlsx> [--ip IP]
[--mfrom MFrom] [--hfrom HFrom] [--rcpts Rcpts]
[--rpath RPath] [--msgid MsgID] [--subject Subject]
[--size MsgSz] [--date Date] [--gen-hfrom] [--gen-rpath]
[--gen-alignment] [--gen-msgid] [--expand-recipients]
[--no-display-name] [--remove-prvs] [--decode-srs]
[--no-empty-hfrom] [--sample-subject]
[--exclude-ips <ip> [<ip> ...]]
[--exclude-domains <domain> [<domain> ...]]
[--restrict-domains <domain> [<domain> ...]]
[--exclude-senders <sender> [<sender> ...]]
[--date-format DateFmt] [--no-default-exclude-domains]
This tool helps identify the top senders based on smart search outbound
message exports.
Input / Output arguments (required):
-i <file> [<file> ...], --input <file> [<file> ...] Smart search files to
read.
-o <xlsx>, --output <xlsx> Output file
Field mapping arguments (optional):
--ip IP CSV field of the IP
address. (default=Sende
r_IP_Address)
--mfrom MFrom CSV field of the
envelope sender
address.
(default=Sender)
--hfrom HFrom CSV field of the header
From: address.
(default=Header_From)
--rcpts Rcpts CSV field of the header
recipient addresses.
(default=Recipients)
--rpath RPath CSV field of the
Return-Path: address.
(default=Header_Return-
Path)
--msgid MsgID CSV field of the
message ID.
(default=Message_ID)
--subject Subject CSV field of the
Subject, only used if
--sample-subject is
specified.
(default=Subject)
--size MsgSz CSV field of message
size.
(default=Message_Size)
--date Date CSV field of message
date/time.
(default=Date)
Reporting control arguments (optional):
--gen-hfrom Generate report showing
the header From: data
for messages being
sent.
--gen-rpath Generate report showing
return path for
messages being sent.
--gen-alignment Generate report showing
envelope sender and
header From: alignment
--gen-msgid Generate report showing
parsed Message ID.
Helps determine the
sending system
Parsing behavior arguments (optional):
--expand-recipients Expand recipients
counts messages by
destination. E.g. 1
message going to 3
people, is 3 messages
sent.
--no-display-name Remove display and use
address only. Converts
'Display Name
<user@domain.com>' to
'user@domain.com'
--remove-prvs Remove return path
verification strings
e.g. prvs=tag=sender@do
main.com
--decode-srs Convert sender rewrite
scheme, forwardmailbox+
srs=hash=tt=domain.com=
user to user@domain.com
--no-empty-hfrom If the header From: is
empty the envelope
sender address is used
--sample-subject Enable probabilistic
random sampling of
subject lines found
during processing
--exclude-ips <ip> [<ip> ...] Exclude ips from
processing.
--exclude-domains <domain> [<domain> ...] Exclude domains from
processing.
--restrict-domains <domain> [<domain> ...] Constrain domains for
processing.
--exclude-senders <sender> [<sender> ...] Exclude senders from
processing.
--date-format DateFmt Date format used to
parse the timestamps. (
default=%Y-%m-%dT%H:%M:
%S.%f%z)
Extended processing controls (optional):
--no-default-exclude-domains Will not include the
default Proofpoint
excluded domains.
Usage:
-h, --help Show this help message
and exit
--version Show the program's
version and exit
Using the Tool with Proofpoint Smart Search
Export all outbound message traffic as a smart search CSV. You may need to export multiple CSVs if the data per time window exceeds 1M records. The tool can ingest multiple CSVs files at once.
Once the files are downlaoded to a target folder, you can run the following command with the path to the files you downloaded and specify a wildard.
The following example is the most basic usage:
# Windows
senderstats -i C:\path\to\downloaded\files\smart_search_results_cluster_hosted_2024_03_04_*.csv -o C:\path\to\output\file\my_cluster_hosted.xlsx
# Linux
senderstats -i /path/to/downloaded/files/smart_search_results_cluster_hosted_2024_03_04_*.csv -o /path/to/output/file/my_cluster_hosted.xlsx
For a more comprehensive report use the following command:
# Windows
senderstats -i C:\path\to\downloaded\files\smart_search_results_cluster_hosted_2024_03_04_*.csv -o C:\path\to\output\file\my_cluster_hosted.xlsx --remove-prvs --decode-srs --gen-hfrom --gen-alignment --gen-msgid --sample-subject
# Linux
senderstats -i /path/to/downloaded/files/smart_search_results_cluster_hosted_2024_03_04_*.csv -o /path/to/output/file/my_cluster_hosted.xlsx --remove-prvs --decode-srs --gen-hfrom --gen-alignment --gen-msgid --sample-subject
Expanding recipients counts messages by destination via --expand-recipients:
This is useful if you need to determine how many messages were sent to a destination, as a single message can be addressed to multiple recipients.
# Windows
senderstats -i C:\path\to\downloaded\files\smart_search_results_cluster_hosted_2024_03_04_*.csv -o C:\path\to\output\file\my_cluster_hosted.xlsx --remove-prvs --decode-srs --gen-hfrom --gen-alignment --gen-msgid --sample-subject --expand-recipients
# Linux
senderstats -i /path/to/downloaded/files/smart_search_results_cluster_hosted_2024_03_04_*.csv -o /path/to/output/file/my_cluster_hosted.xlsx --remove-prvs --decode-srs --gen-hfrom --gen-alignment --gen-msgid --sample-subject --expand-recipients
Sample Output
The execution results should look similar to the following depending the options you select.
Files to be processed:
C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173552.csv
C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173855.csv
C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173656.csv
C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173754.csv
C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173834.csv
Domains excluded from processing:
knowledgefront.com
pphosted.com
ppops.net
Processing: C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173552.csv
Processing: C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173855.csv
Processing: C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173656.csv
Processing: C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173754.csv
Processing: C:\Users\ljerabek\Downloads\smart_search_results_cluster_hosted_2024_03_04_173834.csv
Messages excluded by empty senders: 1523573
Messages excluded by domain: 484664
Messages excluded by sender: 0
Messages excluded by constraint: 0
Generating report, please wait.
Please see report: C:\Users\ljerabek\Downloads\my_cluster_hosted.xlsx
Sample Summary Statistics
Sample Details (Sender + From by Volume):
Sample Details (Message ID) Inferencing:
Sample Details (Hourly Metrics):
Current Class Heirarchy
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file senderstats-2.0.6.tar.gz
.
File metadata
- Download URL: senderstats-2.0.6.tar.gz
- Upload date:
- Size: 27.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8853bb4654d8a40d0d54c2cbe4e8a1ad2fe818de2ab51486ec03db3dc9d9cf12 |
|
MD5 | aefbf268b4f04febbaedfaef1bbd681d |
|
BLAKE2b-256 | 8596bda4b31821611dddb0cb25002040fee8c613968043b04489e00bc31c6fe1 |
File details
Details for the file senderstats-2.0.6-py3-none-any.whl
.
File metadata
- Download URL: senderstats-2.0.6-py3-none-any.whl
- Upload date:
- Size: 38.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 93a1017846bd0a31c36e84c804cf4e60aa1fe084ccf49309739824c8df0dd12e |
|
MD5 | 8b7fc5c8c42e5f429b012a3b74ab5999 |
|
BLAKE2b-256 | 0d800ded928498b4281145498c589785498cb994719ae92abc0ca9472d2cb79c |