Skip to main content

High-performance URL reputation and phishing detection for MCP Gateway

Project description

URL Reputation (Rust)

Author: Matheus Cafalchio Version: 0.1.0

Blocks URLs based on configured blocked domains, patterns and heuristics before resource fetch. Designed for fast and efficient resource checks.

Runtime Requirements

This plugin depends on cpex>=0.1.0rc1,<0.2 and imports hook models from cpex.framework. The compiled Rust extension is mandatory; there is no Python fallback implementation.

Hooks

  • resource_pre_fetch – triggered before any resource is fetched.

Config

config:
    whitelist_domains: ["ibm.com", "yourdomain.com"]
    allowed_patterns: ["^https://trusted\\.internal/.*"]
    blocked_domains: ["malicious.example.com"]
    blocked_patterns: ["casino", "crypto"]
    use_heuristic_check: true
    entropy_threshold: 3.65
    block_non_secure_http: true

Config Description

  • whitelist_domains

    • A set of domains that are allowed to be fetched without any checks.
  • allowed_patterns

    • A list of regex patterns matched against the full URL. If any pattern matches, the URL is allowed and skips all remaining checks — including the non-secure HTTP check. Evaluated after the whitelist, before scheme enforcement.
  • blocked_domains

    • A set of domains that will always be blocked.
  • blocked_patterns

    • A list of regex patterns matched against the full URL. If any pattern matches, the URL is blocked.
  • use_heuristic_check

    • Whether heuristic checks (entropy, TLD validity, unicode security) should be performed. Default: false.
  • entropy_threshold

    • Maximum allowed Shannon entropy for a domain. Higher entropy may indicate suspicious/malicious domains.
  • block_non_secure_http

    • Whether URLs using http (non-secure) should be blocked. Default: true.

Architecture

flowchart LR
    Start([URL Input]) --> Parse{Parse & Extract Domain}
    Parse -->|Fail| Block1[❌ Parse Error]
    Parse -->|Success| DetectIP[Detect IP]

    DetectIP --> Whitelist{Whitelist?}
    Whitelist -->|Yes| Success[✅ Allow]
    Whitelist -->|No| AllowPat{Allowed Pattern?}

    AllowPat -->|Yes| Success
    AllowPat -->|No| HTTP{Scheme = HTTPS<br/>or not enforced?}

    HTTP -->|No| Block2[❌ Non-HTTPS]
    HTTP -->|Yes| BlockedDom{Blocked Domain<br/>or Pattern?}

    BlockedDom -->|Yes| Block3[❌ Blocked]
    BlockedDom -->|No| Heuristic{Heuristic Check<br/>Enabled & Not IP?}

    Heuristic -->|No| Success
    Heuristic -->|Yes| Checks{Pass Entropy,<br/>TLD & Unicode?}

    Checks -->|No| Block4[❌ Heuristic Fail]
    Checks -->|Yes| Success

    Block1 --> End([Return])
    Block2 --> End
    Block3 --> End
    Block4 --> End
    Success --> End

    style Start fill:#e1f5ff
    style End fill:#e1f5ff
    style Success fill:#c8e6c9
    style Block1 fill:#ffcdd2
    style Block2 fill:#ffcdd2
    style Block3 fill:#ffcdd2
    style Block4 fill:#ffcdd2

Logic workflow

  1. Parse & Normalize URL

    • Trim the input URL, then parse it (scheme and host are normalised to lowercase by the URL parser per RFC 3986; path and query retain original casing).
    • Fail → Violation: "Could not parse url".
  2. Extract Domain

    • Get the host string from the URL.
    • Fail → Violation: "Could not parse domain".
  3. Detect IP Address

    • Determine if domain is an IPv4 or IPv6 address.
    • Skip heuristic checks for IPs.
  4. Whitelist Check

    • If domain is in whitelist_domainscontinue_processing = true, skip all further checks.
  5. Allowed Patterns Check

    • If URL matches any regex in allowed_patternscontinue_processing = true, skip all further checks.
    • Note: this check runs before scheme enforcement, so an allowed_patterns match can bypass the non-secure HTTP block.
  6. Block Non-Secure HTTP

    • If scheme ≠ "https" and block_non_secure_httpViolation: "Blocked non secure http url".
  7. Blocked Domains

    • If domain is in blocked_domainsViolation: "Domain in blocked set".
  8. Blocked Patterns

    • If URL matches any regex in blocked_patternsViolation: "Blocked pattern".
  9. Heuristic Checks (only for non-IP domains and if use_heuristic_check = true): 9.1 High Entropy Check – If Shannon entropy > entropy_thresholdViolation: "High entropy domain". 9.2 TLD Validity Check – Validate top-level domain. Fail → Violation: "Illegal TLD". 9.3 Unicode Security Check – Validate domain unicode. Fail → Violation: "Domain unicode is not secure".

  10. Final Outcome

    • If no violations → continue_processing = true.
    • If any check fails → return first PluginViolation and continue_processing = false.

Limitations

- Static lists only; no external reputation providers.
- Ianna valid TLDs are static and will be out of date
- Ignores other schemes that are not http and https
- No external domain reputation checks

TODOs

- External threat-intel integration with cache – Query external feeds for known malicious domains.
- IP address handling policy – Decide rules for IPv4/IPv6 URLs.
- Dynamic TLD updates – Fetch latest IANA TLD list automatically.

Tests

Test Coverage (24 unit tests, all passing):

Filename Function Coverage Line Coverage Region Coverage
engine.rs 96.55% (28/29) 99.26% (533/537) 98.60% (634/643)
filters/heuristic.rs 100.00% (5/5) 96.49% (55/57) 97.53% (79/81)
filters/patterns.rs 100.00% (5/5) 100.00% (20/20) 100.00% (38/38)
lib.rs 0.00% (0/1) 0.00% (0/5) 0.00% (0/7)
types.rs 50.00% (3/6) 44.12% (15/34) 23.94% (17/71)
TOTAL 89.13% (41/46) 95.43% (627/657) 91.45% (770/842)

Note: lib.rs and types.rs contain PyO3 bindings and module declarations not covered by unit tests.

New test coverage includes:

  • Invalid regex pattern handling (both allowed and blocked patterns)
  • Case-insensitive domain matching (whitelist and blocklist)
  • Subdomain matching validation

Run tests:

cargo nextest run -p url_reputation  # Run Rust unit tests
cargo llvm-cov --lib --html   # Generate coverage report

Heuristic methods

The heuristics were based on a research paper.

A. P. S. Bhadauria and M. Singh, "Domain‑Checker: A Classification of Malicious and Benign Domains Using Multitier Filtering," Springer Nature, 2023.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cpex_url_reputation-0.3.2.tar.gz (104.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cpex_url_reputation-0.3.2-cp311-abi3-win_amd64.whl (852.9 kB view details)

Uploaded CPython 3.11+Windows x86-64

cpex_url_reputation-0.3.2-cp311-abi3-manylinux_2_34_x86_64.whl (933.1 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ x86-64

cpex_url_reputation-0.3.2-cp311-abi3-manylinux_2_34_s390x.whl (987.4 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ s390x

cpex_url_reputation-0.3.2-cp311-abi3-manylinux_2_34_ppc64le.whl (967.6 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ ppc64le

cpex_url_reputation-0.3.2-cp311-abi3-manylinux_2_34_aarch64.whl (878.2 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ ARM64

cpex_url_reputation-0.3.2-cp311-abi3-macosx_11_0_arm64.whl (835.2 kB view details)

Uploaded CPython 3.11+macOS 11.0+ ARM64

File details

Details for the file cpex_url_reputation-0.3.2.tar.gz.

File metadata

  • Download URL: cpex_url_reputation-0.3.2.tar.gz
  • Upload date:
  • Size: 104.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cpex_url_reputation-0.3.2.tar.gz
Algorithm Hash digest
SHA256 688cca06b737c0d786ab6e64d9d7deba424719f3923be61244e1a122e0c0f49c
MD5 a2db220d5f68077b639db84859041c8c
BLAKE2b-256 a710c782770644c55a3c8f6ee29d127a1e6c3301b50c0493c2eb7c89f5df69c1

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.3.2.tar.gz:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_url_reputation-0.3.2-cp311-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for cpex_url_reputation-0.3.2-cp311-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 aa05e189b9f5086b9ad5bc19dc961a9f591d1101478a2803345eb4d3b4d5c077
MD5 88ae8031d22aa140a029ca28efb1f5a0
BLAKE2b-256 8215a3844c8d785e32651fa70a7572c9b38dea27d4431ee0f5bb53313f670c36

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.3.2-cp311-abi3-win_amd64.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_url_reputation-0.3.2-cp311-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for cpex_url_reputation-0.3.2-cp311-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 36da706ba875ee37f2773994539942f55710c067d3428ce8e4214e785c63352c
MD5 0d6e7f3e81b5e8e4e447e469e03d7933
BLAKE2b-256 5833a262bb1eaf8186e0a9ae19f3f0e8e95a880c464bc6a7df1970e89d449e3f

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.3.2-cp311-abi3-manylinux_2_34_x86_64.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_url_reputation-0.3.2-cp311-abi3-manylinux_2_34_s390x.whl.

File metadata

File hashes

Hashes for cpex_url_reputation-0.3.2-cp311-abi3-manylinux_2_34_s390x.whl
Algorithm Hash digest
SHA256 de14035983888fd66b12fd3bc72f86bd059307c771f893b990e061774fdcd7be
MD5 7077067277cd381b46cc704d2a97f4fb
BLAKE2b-256 5459b7fca0b73c2d54aa609a889d7dd24ff2efe44515315a8e2880ac106b8bdd

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.3.2-cp311-abi3-manylinux_2_34_s390x.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_url_reputation-0.3.2-cp311-abi3-manylinux_2_34_ppc64le.whl.

File metadata

File hashes

Hashes for cpex_url_reputation-0.3.2-cp311-abi3-manylinux_2_34_ppc64le.whl
Algorithm Hash digest
SHA256 b2e0b981452173d7abe1829151c6aa4ae40db2e504c93f048e45e8642ae3c35f
MD5 daf8ff1c4b488e808b0cb3179bbd0b89
BLAKE2b-256 22f0d919086f9280b2c849c4c9657b1f21207991e3f9c114aac020c46c74afd7

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.3.2-cp311-abi3-manylinux_2_34_ppc64le.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_url_reputation-0.3.2-cp311-abi3-manylinux_2_34_aarch64.whl.

File metadata

File hashes

Hashes for cpex_url_reputation-0.3.2-cp311-abi3-manylinux_2_34_aarch64.whl
Algorithm Hash digest
SHA256 35281b80d563d79f4382eeb30a92ac99e053fae5052608425e184a0987b25e92
MD5 c39c37492e822c6a6f7b3607da8e2a29
BLAKE2b-256 d44808e1f46dffa387a09b9431208af453a26065b47ef16acee8af5bc526f71d

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.3.2-cp311-abi3-manylinux_2_34_aarch64.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_url_reputation-0.3.2-cp311-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cpex_url_reputation-0.3.2-cp311-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 437f2c2e50f97a64b8615c59683f284205036458ad5fec62fa1cd27bce6d5b44
MD5 b511308c5c21df6948bd7d3aa8d43c14
BLAKE2b-256 f1e7b168d3b2ddaa60fe88df40b1f3f67738a8c6fbd19578d8962a0523e662ac

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.3.2-cp311-abi3-macosx_11_0_arm64.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page