Skip to main content

High-performance URL reputation and phishing detection for MCP Gateway

Project description

URL Reputation (Rust)

Author: Matheus Cafalchio Version: 0.1.0

Blocks URLs based on configured blocked domains, patterns and heuristics before resource fetch. Designed for fast and efficient resource checks.

Runtime Requirements

This plugin depends on cpex>=0.1.0rc1,<0.2 and imports hook models from cpex.framework. The compiled Rust extension is mandatory; there is no Python fallback implementation.

Hooks

  • resource_pre_fetch – triggered before any resource is fetched.

Config

config:
    whitelist_domains: ["ibm.com", "yourdomain.com"]
    allowed_patterns: ["^https://trusted\\.internal/.*"]
    blocked_domains: ["malicious.example.com"]
    blocked_patterns: ["casino", "crypto"]
    use_heuristic_check: true
    entropy_threshold: 3.65
    block_non_secure_http: true

Config Description

  • whitelist_domains

    • A set of domains that are allowed to be fetched without any checks.
  • allowed_patterns

    • A list of regex patterns matched against the full URL. If any pattern matches, the URL is allowed and skips all remaining checks — including the non-secure HTTP check. Evaluated after the whitelist, before scheme enforcement.
  • blocked_domains

    • A set of domains that will always be blocked.
  • blocked_patterns

    • A list of regex patterns matched against the full URL. If any pattern matches, the URL is blocked.
  • use_heuristic_check

    • Whether heuristic checks (entropy, TLD validity, unicode security) should be performed. Default: false.
  • entropy_threshold

    • Maximum allowed Shannon entropy for a domain. Higher entropy may indicate suspicious/malicious domains.
  • block_non_secure_http

    • Whether URLs using http (non-secure) should be blocked. Default: true.

Architecture

flowchart LR
    Start([URL Input]) --> Parse{Parse & Extract Domain}
    Parse -->|Fail| Block1[❌ Parse Error]
    Parse -->|Success| DetectIP[Detect IP]

    DetectIP --> Whitelist{Whitelist?}
    Whitelist -->|Yes| Success[✅ Allow]
    Whitelist -->|No| AllowPat{Allowed Pattern?}

    AllowPat -->|Yes| Success
    AllowPat -->|No| HTTP{Scheme = HTTPS<br/>or not enforced?}

    HTTP -->|No| Block2[❌ Non-HTTPS]
    HTTP -->|Yes| BlockedDom{Blocked Domain<br/>or Pattern?}

    BlockedDom -->|Yes| Block3[❌ Blocked]
    BlockedDom -->|No| Heuristic{Heuristic Check<br/>Enabled & Not IP?}

    Heuristic -->|No| Success
    Heuristic -->|Yes| Checks{Pass Entropy,<br/>TLD & Unicode?}

    Checks -->|No| Block4[❌ Heuristic Fail]
    Checks -->|Yes| Success

    Block1 --> End([Return])
    Block2 --> End
    Block3 --> End
    Block4 --> End
    Success --> End

    style Start fill:#e1f5ff
    style End fill:#e1f5ff
    style Success fill:#c8e6c9
    style Block1 fill:#ffcdd2
    style Block2 fill:#ffcdd2
    style Block3 fill:#ffcdd2
    style Block4 fill:#ffcdd2

Logic workflow

  1. Parse & Normalize URL

    • Trim the input URL, then parse it (scheme and host are normalised to lowercase by the URL parser per RFC 3986; path and query retain original casing).
    • Fail → Violation: "Could not parse url".
  2. Extract Domain

    • Get the host string from the URL.
    • Fail → Violation: "Could not parse domain".
  3. Detect IP Address

    • Determine if domain is an IPv4 or IPv6 address.
    • Skip heuristic checks for IPs.
  4. Whitelist Check

    • If domain is in whitelist_domainscontinue_processing = true, skip all further checks.
  5. Allowed Patterns Check

    • If URL matches any regex in allowed_patternscontinue_processing = true, skip all further checks.
    • Note: this check runs before scheme enforcement, so an allowed_patterns match can bypass the non-secure HTTP block.
  6. Block Non-Secure HTTP

    • If scheme ≠ "https" and block_non_secure_httpViolation: "Blocked non secure http url".
  7. Blocked Domains

    • If domain is in blocked_domainsViolation: "Domain in blocked set".
  8. Blocked Patterns

    • If URL matches any regex in blocked_patternsViolation: "Blocked pattern".
  9. Heuristic Checks (only for non-IP domains and if use_heuristic_check = true): 9.1 High Entropy Check – If Shannon entropy > entropy_thresholdViolation: "High entropy domain". 9.2 TLD Validity Check – Validate top-level domain. Fail → Violation: "Illegal TLD". 9.3 Unicode Security Check – Validate domain unicode. Fail → Violation: "Domain unicode is not secure".

  10. Final Outcome

    • If no violations → continue_processing = true.
    • If any check fails → return first PluginViolation and continue_processing = false.

Limitations

- Static lists only; no external reputation providers.
- Ianna valid TLDs are static and will be out of date
- Ignores other schemes that are not http and https
- No external domain reputation checks

TODOs

- External threat-intel integration with cache – Query external feeds for known malicious domains.
- IP address handling policy – Decide rules for IPv4/IPv6 URLs.
- Dynamic TLD updates – Fetch latest IANA TLD list automatically.

Tests

Test Coverage (24 unit tests, all passing):

Filename Function Coverage Line Coverage Region Coverage
engine.rs 96.55% (28/29) 99.26% (533/537) 98.60% (634/643)
filters/heuristic.rs 100.00% (5/5) 96.49% (55/57) 97.53% (79/81)
filters/patterns.rs 100.00% (5/5) 100.00% (20/20) 100.00% (38/38)
lib.rs 0.00% (0/1) 0.00% (0/5) 0.00% (0/7)
types.rs 50.00% (3/6) 44.12% (15/34) 23.94% (17/71)
TOTAL 89.13% (41/46) 95.43% (627/657) 91.45% (770/842)

Note: lib.rs and types.rs contain PyO3 bindings and module declarations not covered by unit tests.

New test coverage includes:

  • Invalid regex pattern handling (both allowed and blocked patterns)
  • Case-insensitive domain matching (whitelist and blocklist)
  • Subdomain matching validation

Run tests:

cargo nextest run -p url_reputation  # Run Rust unit tests
cargo llvm-cov --lib --html   # Generate coverage report

Heuristic methods

The heuristics were based on a research paper.

A. P. S. Bhadauria and M. Singh, "Domain‑Checker: A Classification of Malicious and Benign Domains Using Multitier Filtering," Springer Nature, 2023.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cpex_url_reputation-0.3.0.tar.gz (103.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cpex_url_reputation-0.3.0-cp311-abi3-win_amd64.whl (852.8 kB view details)

Uploaded CPython 3.11+Windows x86-64

cpex_url_reputation-0.3.0-cp311-abi3-manylinux_2_34_x86_64.whl (936.4 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ x86-64

cpex_url_reputation-0.3.0-cp311-abi3-manylinux_2_34_s390x.whl (987.4 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ s390x

cpex_url_reputation-0.3.0-cp311-abi3-manylinux_2_34_ppc64le.whl (967.6 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ ppc64le

cpex_url_reputation-0.3.0-cp311-abi3-manylinux_2_34_aarch64.whl (878.2 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ ARM64

cpex_url_reputation-0.3.0-cp311-abi3-macosx_11_0_arm64.whl (835.2 kB view details)

Uploaded CPython 3.11+macOS 11.0+ ARM64

File details

Details for the file cpex_url_reputation-0.3.0.tar.gz.

File metadata

  • Download URL: cpex_url_reputation-0.3.0.tar.gz
  • Upload date:
  • Size: 103.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cpex_url_reputation-0.3.0.tar.gz
Algorithm Hash digest
SHA256 9dd9c717b38e61246aaaf13087ea448082b445c08d6d80f3a46aceb19a54de1a
MD5 fd40f38f41b1f8029d81ad8310371c0c
BLAKE2b-256 ce040bcbceab5da41b8114403a1d1f27818eb6624c3acd12e83c3c321045e611

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.3.0.tar.gz:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_url_reputation-0.3.0-cp311-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for cpex_url_reputation-0.3.0-cp311-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 1bd7960597150ceb43666cfcbbc3b5f55354bd6fc2a85fa4996d4dff7d978787
MD5 2ba67c825a6bece8023522ee774e841a
BLAKE2b-256 5667d6b4bd4e01897554d0156c395f3b5e46216966bff9c8821676ae07ad4c03

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.3.0-cp311-abi3-win_amd64.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_url_reputation-0.3.0-cp311-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for cpex_url_reputation-0.3.0-cp311-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 8067b7b7652ccabf834fc4797b64458518c7d5c28e150edf3f3f3e1910825c86
MD5 b138625b1fb3d8de8bf96789ff6f3c5e
BLAKE2b-256 f7c143ff30ffb2f8d294d9570d0d14a2a1bf16fd083259d0b0df0692223aa606

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.3.0-cp311-abi3-manylinux_2_34_x86_64.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_url_reputation-0.3.0-cp311-abi3-manylinux_2_34_s390x.whl.

File metadata

File hashes

Hashes for cpex_url_reputation-0.3.0-cp311-abi3-manylinux_2_34_s390x.whl
Algorithm Hash digest
SHA256 1c11e15ca94ffa3fcb1e4f9344bb82653a089ac252ea483b9b14dc2e7c5f0453
MD5 5820050c4a04c6ec5c09f19b3aeb2178
BLAKE2b-256 546721635b0e0865d5b3843cce145f22f6c1fa582f0a03d5b8f4e5e4024af6dd

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.3.0-cp311-abi3-manylinux_2_34_s390x.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_url_reputation-0.3.0-cp311-abi3-manylinux_2_34_ppc64le.whl.

File metadata

File hashes

Hashes for cpex_url_reputation-0.3.0-cp311-abi3-manylinux_2_34_ppc64le.whl
Algorithm Hash digest
SHA256 fde430c7124c1aa4746b8ad3fcf97bc7c9d1b4dec64cc81b085f01da7ed4a9f5
MD5 2791f946b312e4d1638c7969ce58b124
BLAKE2b-256 9b2299ee36b46600ba26a092221d6cb4b3af5dd690fb28248f55821c233d3413

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.3.0-cp311-abi3-manylinux_2_34_ppc64le.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_url_reputation-0.3.0-cp311-abi3-manylinux_2_34_aarch64.whl.

File metadata

File hashes

Hashes for cpex_url_reputation-0.3.0-cp311-abi3-manylinux_2_34_aarch64.whl
Algorithm Hash digest
SHA256 7a1d45a5b8c4ee7ec8cef6be4936ba44ed4854595adb887fbef639ea144ac9a0
MD5 e9c6bd2ad17332703883d93acec679d6
BLAKE2b-256 af4a4e3da2a8245afa1ff769591f9a315528038576c71619dcde506b481eb33d

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.3.0-cp311-abi3-manylinux_2_34_aarch64.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_url_reputation-0.3.0-cp311-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cpex_url_reputation-0.3.0-cp311-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a0a5fb53640a3ed21750b5b547b6115a998c42b1cd7e6d3f535270ed2c8712fb
MD5 f8b1f2da37c40e02260dbdcc2cbbccf3
BLAKE2b-256 0c16c25f41ce3e1d52216000988d807b9df997bb25a98be6c57284f45a551c3b

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.3.0-cp311-abi3-macosx_11_0_arm64.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page