Data-grounded Threat Event Frequency estimation with vector decomposition

These details have not been verified by PyPI

Project links

Project description

TEF Estimator

Important: This is an independent implementation of TEF estimation for the FAIR methodology. See FAIR_NOTICE.md for trademark information, data sources, and attributions.

Data-grounded Threat Event Frequency estimation with vector decomposition and multi-scenario support.

Produces defensible TEF estimates for cyber risk quantification by decomposing threat frequency into four initial access vectors (exploitation, credential, phishing, supply chain), each with independent data sources and floor/ceiling bounds. Currently supports ransomware, business email compromise (BEC), and custom analyst-defined scenarios.

See docs/user-guide.md for methodology details, docs/technical-reference.md for the full specification, and docs/api-reference.md for CLI and Python API.

Authors

Laura Voicu and Jack Jones.

Web UI

TEF Estimate overview

The web interface provides live estimation with sidebar profile inputs, vector breakdown charts, and natural-language interpretation. Additional panels expand inline:

Feature	Description
Analysis	Distribution parameters (lognormal mu/sigma, percentiles) and vector priority ranking with multiplier explanations
Sensitivity Analysis	Tornado chart showing which parameters drive the estimate most
Compare Mode	Side-by-side estimation for two organization profiles with delta
Audit Trail	Full validation checks, anchor convergence, floor/ceiling bounds

More screenshots

Analysis & Sensitivity Analysis and sensitivity analysis

Sensitivity tornado chart Sensitivity analysis tornado

Compare Mode Compare mode with Profile B

Audit Trail Audit trail with validation checks

Installation

From PyPI:

pip install tef-estimator

With optional extras:

pip install tef-estimator[ui]          # Web interface (NiceGUI)
pip install tef-estimator[telemetry]   # Continuous monitoring (requests)
pip install tef-estimator[viz]         # Charts (matplotlib, plotly)
pip install tef-estimator[all]         # Everything

Or from source:

git clone https://github.com/security-decision-science/tef-estimator.git
cd tef-estimator
pip install -e ".[dev]"

Requires Python 3.10+. All reference data ships with the package — no external API keys or accounts needed.

Quick Start

Python API

from tef_estimator.engine import TEFEngine
from tef_estimator.profile import OrganizationProfile
from tef_estimator.data.common import Sector, RevenueBand, Geography, RemoteAccessType
from tef_estimator.data.scenarios.ransomware import RansomwareScenario
from tef_estimator.data.scenarios.bec import BECScenario

profile = OrganizationProfile(
    sector=Sector.MANUFACTURING,
    revenue_band=RevenueBand.R_100M_1B,
    geography=Geography.US,
    remote_access=[RemoteAccessType.FORTINET],
    employee_count=2000,
)

# Ransomware estimate
result = TEFEngine(scenario=RansomwareScenario()).estimate(profile)
print(result.brief_report())

# BEC estimate (same profile, different scenario)
bec_result = TEFEngine(scenario=BECScenario()).estimate(profile)
print(bec_result.brief_report())

Continuous Telemetry Monitoring

pip install tef-estimator[telemetry]
tef-estimator telemetry init
tef-estimator telemetry collect --force
tef-estimator telemetry baseline
tef-estimator telemetry compare
tef-estimator telemetry watch --interval 60

Collects from 7 sources: 5 live public APIs (DShield, CISA KEV, Ransomware.live, GreyNoise, annual report edition monitor) plus 2 bundled reference data importers (IRIS reference data, initial access vector benchmarks from DBIR/Unit42/Mandiant/Beazley/CrowdStrike/IBM). Integrates into rolling averages, compares against a stored baseline, and re-estimates TEF when significant shifts are detected. Requires requests.

Web UI

pip install tef-estimator[ui]
tef-estimator ui

Opens a browser-based interface with sidebar profile inputs, live estimation, vector breakdown and tornado sensitivity charts, and compare mode. Dark Material Design theme via NiceGUI.

CLI

# Ransomware TEF estimate
tef-estimator estimate --sector manufacturing --revenue 100m_1b --geo us \
    --remote-access fortinet --employees 2000

# BEC estimate
tef-estimator estimate --sector financial --revenue 100m_1b --geo us \
    --scenario bec

# Full calculation trace
tef-estimator explain --sector manufacturing --revenue 100m_1b --geo us

# Export to markdown file
tef-estimator estimate --sector manufacturing --revenue 100m_1b --geo us -o report.md

# Compare two profiles
tef-estimator compare \
    --sector manufacturing --revenue 100m_1b --geo us --remote-access fortinet \
    --b-sector manufacturing --b-revenue 100m_1b --b-geo us --b-remote-access none

# Sensitivity analysis
tef-estimator sensitivity --sector manufacturing --revenue 100m_1b --geo us

# Inspect embedded data
tef-estimator data multipliers
tef-estimator data base-rate --scenario bec
tef-estimator data vectors --scenario ransomware

How It Works

Three-Layer Estimation

Each vector estimate is bounded by three layers:

Floor (observed LEF) -- IRIS 2025 observed loss event frequencies by sector and revenue band. Since TEF >= LEF by definition, these are a logical minimum.
Ceiling (campaign-level contact frequency) -- derived from DShield scanning telemetry (exploitation) and operational tempo data (credential). Confirms constant bombardment but doesn't constrain the estimate.
Positioned estimate (base rate x profile multipliers) -- a triangulated base rate adjusted by sector, revenue band, technology exposure, and geography. The positioned estimate sits between floor and ceiling.

Vector Decomposition

Total TEF is decomposed into four initial access vectors, each estimated independently:

Vector	Ransomware Share	BEC Share	Primary Data
Exploitation	~20-25%	~3%	DShield scanning, CISA KEV, EPSS
Credential	~50-55%	~22%	Operational tempo, IAB market data
Phishing	~15-20%	~65%	Anti-phishing vendor reports
Supply chain	~5-8%	~10%	IR report third-party involvement rate

Vectors are summed with cross-vector dampening (k=0.85, empirically supported by VERIS analysis of 10,037 incidents).

Credibility Blending and Posterior Band Contraction

When organization-specific telemetry is provided (per-vector observed attempt rates, observation periods, detection coverage), the engine blends the population-level prior with the org's own data using Bühlmann credibility weighting. The point estimate shifts toward the observed rate proportional to the credibility weight Z = n/(n+k).

The uncertainty band contracts via a Gamma-inspired mechanism: the PERT-derived band width is treated as a Gamma prior whose shape α is fit to the elicited range, then updated with observed pseudo-events. More observation periods produce a monotonically tighter output distribution. At zero telemetry, the band equals the prior. The mechanism uses two separately calibrated parameters (k for the mean, α_pert from the PERT range for the band) rather than a single Bayesian model. See docs/technical-reference.md §8.6 for the full derivation and known limitations.

Three-Anchor Base Rate Triangulation

Each scenario's base rate is triangulated from three independent anchors:

Operational tempo -- active groups x campaigns x targets / addressable population
IRIS back-calculation -- observed LEF / susceptibility prior
Insurer market-adjusted -- claims frequency with bias correction

The consensus PERT is computed as the arithmetic mean of anchor modes, bounded by the minimum anchor low and a capped anchor high. A convergence check validates that all anchors are within an order of magnitude. The full triangulation derivation appears in the audit trail (result.full_report() or --full).

Output Tiers

Tier	Method	Content
Tier 1 (summary)	`result.brief_report()` or `--brief`	Annual probability, recurrence interval, vector bar chart, one sentence, peer percentile
Tier 2 (analysis)	Default CLI output	Tier 1 + distribution parameters, sensitivity, per-vector ranges
Tier 3 (audit)	`result.full_report()` or `--full`	Complete calculation traces, validation checks, triangulation derivation, data sources, warnings
JSON	`result.to_dict()` or `--json`	All three tiers as structured data
Markdown	`result.to_markdown()` or `--output file.md`	Structured markdown with tables, traces, and sources — suitable for reports

Data Sources

All empirical parameters are loaded from bundled JSON files under data/reference/ with source citations. Nothing is hardcoded in the engine.

Source	What It Feeds
IRIS 2025 (Cyentia)	Sector/revenue multipliers, floor anchors, ransomware shares
DBIR 2025/2026 (Verizon)	Vector proportions, third-party rates
Unit 42 IR 2025/2026 (Palo Alto)	Vector proportions (500-750 engagements)
M-Trends 2026 (Mandiant)	Exploit dominance, vishing emergence
CrowdStrike GTR 2026	Credential proxy indicators, edge device targeting
Beazley Q3 2025	VPN credential proportion, RDP initial access
IBM CODB 2025	Vector proportions (600 organizations)
Coalition 2025	Claims frequency, bias correction anchor
FBI IC3 2024	BEC operational tempo, sector targeting
CISA KEV	Exploitation floor (unpatched CVEs)
DShield / SANS ISC	Exploitation ceiling (scanning telemetry)
EPSS (Cyentia)	Exploitation positioning
Ransomware.live	Operational tempo (victim claim counts)
VERIS / VCDB	Dampening coefficient empirical support

Data Refresh

Reference data ships with the package. To update, replace the extracted.json files under src/tef_estimator/data/reference/. The tef-estimator refresh check command validates data freshness:

tef-estimator refresh check      # Validate data freshness (reports per-source age and staleness warnings)

Data freshness warnings also appear automatically in estimation output when any source is >90 days old.

Scenarios

Scenarios are pluggable data definitions under tef_estimator/data/scenarios/. The engine is scenario-agnostic -- adding a new scenario requires only a JSON data file and a Python class implementing ScenarioDefinition.

Scenario	Slug	Typical TEF (mid-market, US)
Ransomware	`ransomware`	~0.7% (~1 in 146 years)
Business Email Compromise	`bec`	~14.9% (~1 in 7 years)
Custom	user-defined	varies

Custom scenarios are defined as JSON files specifying vector proportions across the four initial access vectors (exploitation, credential, phishing, supply chain), a base rate, and an overall incident share. Generate a template with tef-estimator scenario template, or use the visual builder in the web UI's Scenarios tab. See docs/user-guide.md for the full JSON spec.

Limitations

The base rate is the weakest link. Despite three-anchor triangulation, all three anchors have wide uncertainty bands. The base rate drives the output more than any multiplier.
The floor is too low. IRIS observed LEF captures only publicly disclosed events. The disclosure gap varies by cohort.
Cross-vector dampening is a judgment call. k=0.85 is empirically supported by VERIS co-occurrence analysis (credential x phishing lift=8.3, exploitation independent at lift~0.2), but the bimodal structure means a single k is a simplification.
BEC sector/revenue data is less granular than ransomware. IRIS does not publish BEC-specific breakdowns; BEC shares are derived from FBI IC3 and the DBIR.
TEF is non-stationary. Output is labelled point-in-time; refresh on a regular basis (quarterly recommended).
This estimates how often adversaries TRY, not how often they succeed. Success probability depends on controls (which should beassessed separately).

License

Code and data in this repository are released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0). See LICENSE and FAIR_NOTICE.md.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.1.4

Jul 1, 2026

1.1.3

Jun 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tef_estimator-1.1.4.tar.gz (678.3 kB view details)

Uploaded Jul 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tef_estimator-1.1.4-py3-none-any.whl (120.9 kB view details)

Uploaded Jul 1, 2026 Python 3

File details

Details for the file tef_estimator-1.1.4.tar.gz.

File metadata

Download URL: tef_estimator-1.1.4.tar.gz
Upload date: Jul 1, 2026
Size: 678.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.12

File hashes

Hashes for tef_estimator-1.1.4.tar.gz
Algorithm	Hash digest
SHA256	`7c547b27b4a969f84714728400af9f8a2761e39206d79bb87d35bf6e68a16ec7`
MD5	`6b4ead82dd0915ab0e435d82440dfab7`
BLAKE2b-256	`587dd793b1e8720debcaa225c329d1f4c36b6dda072298fd206e31f509f41a54`

See more details on using hashes here.

File details

Details for the file tef_estimator-1.1.4-py3-none-any.whl.

File metadata

Download URL: tef_estimator-1.1.4-py3-none-any.whl
Upload date: Jul 1, 2026
Size: 120.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.12

File hashes

Hashes for tef_estimator-1.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`131b71f333a21f2e7b54654f4d4683433c15ac7ee5a5dca2f3e8fb66b57aa9c8`
MD5	`f236dfcc034c5b3b27c900565774fa03`
BLAKE2b-256	`37892af7c59a95d3064a2003ea255eb014b16e2bc252f4f7acbb323aadb30de5`

See more details on using hashes here.

tef-estimator 1.1.4

Navigation

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

TEF Estimator

Authors

Web UI

Installation

Quick Start

Python API

Continuous Telemetry Monitoring

Web UI

CLI

How It Works

Three-Layer Estimation

Vector Decomposition

Credibility Blending and Posterior Band Contraction

Three-Anchor Base Rate Triangulation

Output Tiers

Data Sources

Data Refresh

Scenarios

Limitations

License

Project details

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes