Skip to main content

High-performance URL reputation and phishing detection for MCP Gateway

Project description

URL Reputation (Rust)

Author: Matheus Cafalchio Version: 0.1.0

Blocks URLs based on configured blocked domains, patterns and heuristics before resource fetch. Designed for fast and efficient resource checks.

Hooks

  • resource_pre_fetch – triggered before any resource is fetched.

Config

config:
    whitelist_domains: ["ibm.com", "yourdomain.com"]
    allowed_patterns: ["^https://trusted\\.internal/.*"]
    blocked_domains: ["malicious.example.com"]
    blocked_patterns: ["casino", "crypto"]
    use_heuristic_check: true
    entropy_threshold: 3.65
    block_non_secure_http: true

Config Description

  • whitelist_domains

    • A set of domains that are allowed to be fetched without any checks.
  • allowed_patterns

    • A list of regex patterns matched against the full URL. If any pattern matches, the URL is allowed and skips all remaining checks — including the non-secure HTTP check. Evaluated after the whitelist, before scheme enforcement.
  • blocked_domains

    • A set of domains that will always be blocked.
  • blocked_patterns

    • A list of regex patterns matched against the full URL. If any pattern matches, the URL is blocked.
  • use_heuristic_check

    • Whether heuristic checks (entropy, TLD validity, unicode security) should be performed. Default: false.
  • entropy_threshold

    • Maximum allowed Shannon entropy for a domain. Higher entropy may indicate suspicious/malicious domains.
  • block_non_secure_http

    • Whether URLs using http (non-secure) should be blocked. Default: true.

Architecture

flowchart LR
    Start([URL Input]) --> Parse{Parse & Extract Domain}
    Parse -->|Fail| Block1[❌ Parse Error]
    Parse -->|Success| DetectIP[Detect IP]

    DetectIP --> Whitelist{Whitelist?}
    Whitelist -->|Yes| Success[✅ Allow]
    Whitelist -->|No| AllowPat{Allowed Pattern?}

    AllowPat -->|Yes| Success
    AllowPat -->|No| HTTP{Scheme = HTTPS<br/>or not enforced?}

    HTTP -->|No| Block2[❌ Non-HTTPS]
    HTTP -->|Yes| BlockedDom{Blocked Domain<br/>or Pattern?}

    BlockedDom -->|Yes| Block3[❌ Blocked]
    BlockedDom -->|No| Heuristic{Heuristic Check<br/>Enabled & Not IP?}

    Heuristic -->|No| Success
    Heuristic -->|Yes| Checks{Pass Entropy,<br/>TLD & Unicode?}

    Checks -->|No| Block4[❌ Heuristic Fail]
    Checks -->|Yes| Success

    Block1 --> End([Return])
    Block2 --> End
    Block3 --> End
    Block4 --> End
    Success --> End

    style Start fill:#e1f5ff
    style End fill:#e1f5ff
    style Success fill:#c8e6c9
    style Block1 fill:#ffcdd2
    style Block2 fill:#ffcdd2
    style Block3 fill:#ffcdd2
    style Block4 fill:#ffcdd2

Logic workflow

  1. Parse & Normalize URL

    • Trim the input URL, then parse it (scheme and host are normalised to lowercase by the URL parser per RFC 3986; path and query retain original casing).
    • Fail → Violation: "Could not parse url".
  2. Extract Domain

    • Get the host string from the URL.
    • Fail → Violation: "Could not parse domain".
  3. Detect IP Address

    • Determine if domain is an IPv4 or IPv6 address.
    • Skip heuristic checks for IPs.
  4. Whitelist Check

    • If domain is in whitelist_domainscontinue_processing = true, skip all further checks.
  5. Allowed Patterns Check

    • If URL matches any regex in allowed_patternscontinue_processing = true, skip all further checks.
    • Note: this check runs before scheme enforcement, so an allowed_patterns match can bypass the non-secure HTTP block.
  6. Block Non-Secure HTTP

    • If scheme ≠ "https" and block_non_secure_httpViolation: "Blocked non secure http url".
  7. Blocked Domains

    • If domain is in blocked_domainsViolation: "Domain in blocked set".
  8. Blocked Patterns

    • If URL matches any regex in blocked_patternsViolation: "Blocked pattern".
  9. Heuristic Checks (only for non-IP domains and if use_heuristic_check = true): 9.1 High Entropy Check – If Shannon entropy > entropy_thresholdViolation: "High entropy domain". 9.2 TLD Validity Check – Validate top-level domain. Fail → Violation: "Illegal TLD". 9.3 Unicode Security Check – Validate domain unicode. Fail → Violation: "Domain unicode is not secure".

  10. Final Outcome

    • If no violations → continue_processing = true.
    • If any check fails → return first PluginViolation and continue_processing = false.

Limitations

- Static lists only; no external reputation providers.
- Ianna valid TLDs are static and will be out of date
- Ignores other schemes that are not http and https
- No external domain reputation checks

TODOs

- External threat-intel integration with cache – Query external feeds for known malicious domains.
- IP address handling policy – Decide rules for IPv4/IPv6 URLs.
- Dynamic TLD updates – Fetch latest IANA TLD list automatically.

Tests

Test Coverage (24 unit tests, all passing):

Filename Function Coverage Line Coverage Region Coverage
engine.rs 96.55% (28/29) 99.26% (533/537) 98.60% (634/643)
filters/heuristic.rs 100.00% (5/5) 96.49% (55/57) 97.53% (79/81)
filters/patterns.rs 100.00% (5/5) 100.00% (20/20) 100.00% (38/38)
lib.rs 0.00% (0/1) 0.00% (0/5) 0.00% (0/7)
types.rs 50.00% (3/6) 44.12% (15/34) 23.94% (17/71)
TOTAL 89.13% (41/46) 95.43% (627/657) 91.45% (770/842)

Note: lib.rs and types.rs contain PyO3 bindings and module declarations not covered by unit tests.

New test coverage includes:

  • Invalid regex pattern handling (both allowed and blocked patterns)
  • Case-insensitive domain matching (whitelist and blocklist)
  • Subdomain matching validation

Run tests:

cargo nextest run -p url_reputation  # Run Rust unit tests
cargo llvm-cov --lib --html   # Generate coverage report

Heuristic methods

The heuristics were based on a research paper.

A. P. S. Bhadauria and M. Singh, "Domain‑Checker: A Classification of Malicious and Benign Domains Using Multitier Filtering," Springer Nature, 2023.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cpex_url_reputation-0.2.1.tar.gz (61.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cpex_url_reputation-0.2.1-cp311-abi3-win_amd64.whl (863.3 kB view details)

Uploaded CPython 3.11+Windows x86-64

cpex_url_reputation-0.2.1-cp311-abi3-manylinux_2_34_x86_64.whl (946.9 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ x86-64

cpex_url_reputation-0.2.1-cp311-abi3-manylinux_2_34_s390x.whl (1.0 MB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ s390x

cpex_url_reputation-0.2.1-cp311-abi3-manylinux_2_34_ppc64le.whl (977.7 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ ppc64le

cpex_url_reputation-0.2.1-cp311-abi3-manylinux_2_34_aarch64.whl (889.6 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ ARM64

cpex_url_reputation-0.2.1-cp311-abi3-macosx_11_0_arm64.whl (844.7 kB view details)

Uploaded CPython 3.11+macOS 11.0+ ARM64

File details

Details for the file cpex_url_reputation-0.2.1.tar.gz.

File metadata

  • Download URL: cpex_url_reputation-0.2.1.tar.gz
  • Upload date:
  • Size: 61.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cpex_url_reputation-0.2.1.tar.gz
Algorithm Hash digest
SHA256 6bf002cc327bc1379d9a389aba9383df70fa891f8a013a0b8662aebfbe30be8d
MD5 6c162f59a4adec7f39a5047e9b5b5f42
BLAKE2b-256 cf1f6109cbacd8110d89701bde3141d824db08ed6ac3f3acfea2e4f2ed274be7

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.2.1.tar.gz:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_url_reputation-0.2.1-cp311-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for cpex_url_reputation-0.2.1-cp311-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 865e8b9e4efef51ca6f45c93630cb1adfe9cbf2e539bb9b0d87614a33119ce14
MD5 6b58e2ba8a3864677f8abe56cb79a2f1
BLAKE2b-256 70b46e11672197ddd249b1fdeef9992f9a32f874726c724a71830cd5a4680320

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.2.1-cp311-abi3-win_amd64.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_url_reputation-0.2.1-cp311-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for cpex_url_reputation-0.2.1-cp311-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 dbea48bfc3c05a5d7da0de5d85582dc3e7e57d3272d7337b218b1006c8df8618
MD5 c7f74fa3204ffe11a0268b4544b1c6dd
BLAKE2b-256 3adc4af8db139459b8c2f597c6cf2ead7c1ef3c2bc8c1bd2006c03b9126d7a98

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.2.1-cp311-abi3-manylinux_2_34_x86_64.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_url_reputation-0.2.1-cp311-abi3-manylinux_2_34_s390x.whl.

File metadata

File hashes

Hashes for cpex_url_reputation-0.2.1-cp311-abi3-manylinux_2_34_s390x.whl
Algorithm Hash digest
SHA256 eb2d0c16529fb400bb5744c1ed2611cd6ae9dca55bf2ceb1d8bed6ed08d284e3
MD5 7c6dfbee559cfdf0ebc41cf2e356aac2
BLAKE2b-256 dc9acddfa01d30e133b0452aa51c596807f9f101655f79cf59970f26227906af

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.2.1-cp311-abi3-manylinux_2_34_s390x.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_url_reputation-0.2.1-cp311-abi3-manylinux_2_34_ppc64le.whl.

File metadata

File hashes

Hashes for cpex_url_reputation-0.2.1-cp311-abi3-manylinux_2_34_ppc64le.whl
Algorithm Hash digest
SHA256 78be5e1e6990c02d8e734e2fb3675bb768dec9b18e5457b1978e33ea189fb164
MD5 5a83862925af05f24a71212fe5514b63
BLAKE2b-256 312621f479e9af04e49b9a9e3c25b9f422f112e6e5c40dcbe7e794e60c0398be

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.2.1-cp311-abi3-manylinux_2_34_ppc64le.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_url_reputation-0.2.1-cp311-abi3-manylinux_2_34_aarch64.whl.

File metadata

File hashes

Hashes for cpex_url_reputation-0.2.1-cp311-abi3-manylinux_2_34_aarch64.whl
Algorithm Hash digest
SHA256 66f5dda770746c2bd849c98b8a71dabca269d48048bf99364f02f2ce3a7e2c41
MD5 bbcb742796fe8fb253b955b3f32ccdb4
BLAKE2b-256 caf512f79582ccf4b8811607d3958148cd02f3104720a23ec61ca4fdfa852b17

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.2.1-cp311-abi3-manylinux_2_34_aarch64.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_url_reputation-0.2.1-cp311-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cpex_url_reputation-0.2.1-cp311-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5089c4e719dac53e89d1575a2b524f93657fc784d9acbb7718acea266ca04cd8
MD5 7ee3791808ae7ef46ccfb2c52345ecaf
BLAKE2b-256 57132cb2aacaf9199180d763bfef9f5184e7e436251a3d9b1e103dfc957b6cc7

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.2.1-cp311-abi3-macosx_11_0_arm64.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page