Skip to main content

High-performance URL reputation and phishing detection for MCP Gateway

Project description

URL Reputation (Rust)

Author: Matheus Cafalchio Version: 0.1.0

Blocks URLs based on configured blocked domains, patterns and heuristics before resource fetch. Designed for fast and efficient resource checks.

Runtime Requirements

This plugin depends on cpex>=0.1.0rc1,<0.2 and imports hook models from cpex.framework. The compiled Rust extension is mandatory; there is no Python fallback implementation.

Hooks

  • resource_pre_fetch – triggered before any resource is fetched.

Config

config:
    whitelist_domains: ["ibm.com", "yourdomain.com"]
    allowed_patterns: ["^https://trusted\\.internal/.*"]
    blocked_domains: ["malicious.example.com"]
    blocked_patterns: ["casino", "crypto"]
    use_heuristic_check: true
    entropy_threshold: 3.65
    block_non_secure_http: true

Config Description

  • whitelist_domains

    • A set of domains that are allowed to be fetched without any checks.
  • allowed_patterns

    • A list of regex patterns matched against the full URL. If any pattern matches, the URL is allowed and skips all remaining checks — including the non-secure HTTP check. Evaluated after the whitelist, before scheme enforcement.
  • blocked_domains

    • A set of domains that will always be blocked.
  • blocked_patterns

    • A list of regex patterns matched against the full URL. If any pattern matches, the URL is blocked.
  • use_heuristic_check

    • Whether heuristic checks (entropy, TLD validity, unicode security) should be performed. Default: false.
  • entropy_threshold

    • Maximum allowed Shannon entropy for a domain. Higher entropy may indicate suspicious/malicious domains.
  • block_non_secure_http

    • Whether URLs using http (non-secure) should be blocked. Default: true.

Architecture

flowchart LR
    Start([URL Input]) --> Parse{Parse & Extract Domain}
    Parse -->|Fail| Block1[❌ Parse Error]
    Parse -->|Success| DetectIP[Detect IP]

    DetectIP --> Whitelist{Whitelist?}
    Whitelist -->|Yes| Success[✅ Allow]
    Whitelist -->|No| AllowPat{Allowed Pattern?}

    AllowPat -->|Yes| Success
    AllowPat -->|No| HTTP{Scheme = HTTPS<br/>or not enforced?}

    HTTP -->|No| Block2[❌ Non-HTTPS]
    HTTP -->|Yes| BlockedDom{Blocked Domain<br/>or Pattern?}

    BlockedDom -->|Yes| Block3[❌ Blocked]
    BlockedDom -->|No| Heuristic{Heuristic Check<br/>Enabled & Not IP?}

    Heuristic -->|No| Success
    Heuristic -->|Yes| Checks{Pass Entropy,<br/>TLD & Unicode?}

    Checks -->|No| Block4[❌ Heuristic Fail]
    Checks -->|Yes| Success

    Block1 --> End([Return])
    Block2 --> End
    Block3 --> End
    Block4 --> End
    Success --> End

    style Start fill:#e1f5ff
    style End fill:#e1f5ff
    style Success fill:#c8e6c9
    style Block1 fill:#ffcdd2
    style Block2 fill:#ffcdd2
    style Block3 fill:#ffcdd2
    style Block4 fill:#ffcdd2

Logic workflow

  1. Parse & Normalize URL

    • Trim the input URL, then parse it (scheme and host are normalised to lowercase by the URL parser per RFC 3986; path and query retain original casing).
    • Fail → Violation: "Could not parse url".
  2. Extract Domain

    • Get the host string from the URL.
    • Fail → Violation: "Could not parse domain".
  3. Detect IP Address

    • Determine if domain is an IPv4 or IPv6 address.
    • Skip heuristic checks for IPs.
  4. Whitelist Check

    • If domain is in whitelist_domainscontinue_processing = true, skip all further checks.
  5. Allowed Patterns Check

    • If URL matches any regex in allowed_patternscontinue_processing = true, skip all further checks.
    • Note: this check runs before scheme enforcement, so an allowed_patterns match can bypass the non-secure HTTP block.
  6. Block Non-Secure HTTP

    • If scheme ≠ "https" and block_non_secure_httpViolation: "Blocked non secure http url".
  7. Blocked Domains

    • If domain is in blocked_domainsViolation: "Domain in blocked set".
  8. Blocked Patterns

    • If URL matches any regex in blocked_patternsViolation: "Blocked pattern".
  9. Heuristic Checks (only for non-IP domains and if use_heuristic_check = true): 9.1 High Entropy Check – If Shannon entropy > entropy_thresholdViolation: "High entropy domain". 9.2 TLD Validity Check – Validate top-level domain. Fail → Violation: "Illegal TLD". 9.3 Unicode Security Check – Validate domain unicode. Fail → Violation: "Domain unicode is not secure".

  10. Final Outcome

    • If no violations → continue_processing = true.
    • If any check fails → return first PluginViolation and continue_processing = false.

Limitations

- Static lists only; no external reputation providers.
- Ianna valid TLDs are static and will be out of date
- Ignores other schemes that are not http and https
- No external domain reputation checks

TODOs

- External threat-intel integration with cache – Query external feeds for known malicious domains.
- IP address handling policy – Decide rules for IPv4/IPv6 URLs.
- Dynamic TLD updates – Fetch latest IANA TLD list automatically.

Tests

Test Coverage (24 unit tests, all passing):

Filename Function Coverage Line Coverage Region Coverage
engine.rs 96.55% (28/29) 99.26% (533/537) 98.60% (634/643)
filters/heuristic.rs 100.00% (5/5) 96.49% (55/57) 97.53% (79/81)
filters/patterns.rs 100.00% (5/5) 100.00% (20/20) 100.00% (38/38)
lib.rs 0.00% (0/1) 0.00% (0/5) 0.00% (0/7)
types.rs 50.00% (3/6) 44.12% (15/34) 23.94% (17/71)
TOTAL 89.13% (41/46) 95.43% (627/657) 91.45% (770/842)

Note: lib.rs and types.rs contain PyO3 bindings and module declarations not covered by unit tests.

New test coverage includes:

  • Invalid regex pattern handling (both allowed and blocked patterns)
  • Case-insensitive domain matching (whitelist and blocklist)
  • Subdomain matching validation

Run tests:

cargo nextest run -p url_reputation  # Run Rust unit tests
cargo llvm-cov --lib --html   # Generate coverage report

Heuristic methods

The heuristics were based on a research paper.

A. P. S. Bhadauria and M. Singh, "Domain‑Checker: A Classification of Malicious and Benign Domains Using Multitier Filtering," Springer Nature, 2023.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cpex_url_reputation-0.3.1.tar.gz (103.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cpex_url_reputation-0.3.1-cp311-abi3-win_amd64.whl (852.8 kB view details)

Uploaded CPython 3.11+Windows x86-64

cpex_url_reputation-0.3.1-cp311-abi3-manylinux_2_34_x86_64.whl (936.5 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ x86-64

cpex_url_reputation-0.3.1-cp311-abi3-manylinux_2_34_s390x.whl (987.4 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ s390x

cpex_url_reputation-0.3.1-cp311-abi3-manylinux_2_34_ppc64le.whl (967.6 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ ppc64le

cpex_url_reputation-0.3.1-cp311-abi3-manylinux_2_34_aarch64.whl (878.2 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ ARM64

cpex_url_reputation-0.3.1-cp311-abi3-macosx_11_0_arm64.whl (835.1 kB view details)

Uploaded CPython 3.11+macOS 11.0+ ARM64

File details

Details for the file cpex_url_reputation-0.3.1.tar.gz.

File metadata

  • Download URL: cpex_url_reputation-0.3.1.tar.gz
  • Upload date:
  • Size: 103.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cpex_url_reputation-0.3.1.tar.gz
Algorithm Hash digest
SHA256 d5ceab13d15d399af5e72b0110b623d7dd700ada5375a42cfb602b1a2e19260e
MD5 cde725415296788f8c4596b7b59e8e98
BLAKE2b-256 a7b4a355368c551397fcc8d5f4ff28f3b13c71b5a7c59be279b0d615c13e3376

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.3.1.tar.gz:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_url_reputation-0.3.1-cp311-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for cpex_url_reputation-0.3.1-cp311-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 4dd0be56bbc3a065eae56c737ce6f7139dd685769231e5a2b162c1bf9a8bcfcc
MD5 80719ec2b8e44076c3916025542b7296
BLAKE2b-256 5a9a93af4b233631cfaa706d48575c5f4cd7b494d60f1f4b8772c470491913dd

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.3.1-cp311-abi3-win_amd64.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_url_reputation-0.3.1-cp311-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for cpex_url_reputation-0.3.1-cp311-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 0080a5389ca908e8bcf1910ad52fdeac3e38d621fb7045122b2e0ad9e8b1efde
MD5 e3c4d977f3811dd7d7168a934e50e743
BLAKE2b-256 7e448afa6b983806edaf9f76de2078bf8971b29fa9c76e9c7d635c3ff46e3ae9

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.3.1-cp311-abi3-manylinux_2_34_x86_64.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_url_reputation-0.3.1-cp311-abi3-manylinux_2_34_s390x.whl.

File metadata

File hashes

Hashes for cpex_url_reputation-0.3.1-cp311-abi3-manylinux_2_34_s390x.whl
Algorithm Hash digest
SHA256 f372aec4fe5a2b968e3c192be90deda219bcf2dd8661370f3818bb4e9b893020
MD5 1a4e1036edceafe6308353fa5613a2e6
BLAKE2b-256 cd2b0f2ce34a7363886fbee4d581905d44a3c6d4d1bedb04e3c64aee43e49876

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.3.1-cp311-abi3-manylinux_2_34_s390x.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_url_reputation-0.3.1-cp311-abi3-manylinux_2_34_ppc64le.whl.

File metadata

File hashes

Hashes for cpex_url_reputation-0.3.1-cp311-abi3-manylinux_2_34_ppc64le.whl
Algorithm Hash digest
SHA256 ded3dae94182e2c4f037f9e1ae8c0837af566d877fb6e3b6b9f9cc71924d989a
MD5 8b4060b5db88999662fff0a451deec04
BLAKE2b-256 63ece557627ac05081eac8b4cfe9ff87a61d44dfc0f0028aafb8336c51b35283

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.3.1-cp311-abi3-manylinux_2_34_ppc64le.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_url_reputation-0.3.1-cp311-abi3-manylinux_2_34_aarch64.whl.

File metadata

File hashes

Hashes for cpex_url_reputation-0.3.1-cp311-abi3-manylinux_2_34_aarch64.whl
Algorithm Hash digest
SHA256 3b958ed280ee3ace2dcaa3802f38d6d788c44f053f51d152abc5d103d77e0f7e
MD5 4dd67ce00d62cbcfc3a202ec24fd9f0a
BLAKE2b-256 8b67a44019673d24683f7f664dcf58c9900aa3ace583691ce266e9c05b2cfaa7

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.3.1-cp311-abi3-manylinux_2_34_aarch64.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_url_reputation-0.3.1-cp311-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cpex_url_reputation-0.3.1-cp311-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0fc1aff42347c11d47869573bbc932d088dfb265d0b64c4069c229fd10134572
MD5 f2aedd08e8822d88ce2f4e134bd73958
BLAKE2b-256 91d423ba085a968cac81773e971dd9993d04c6afa14207c5c1fb174b81bb6778

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.3.1-cp311-abi3-macosx_11_0_arm64.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page