Skip to main content

High-performance URL reputation and phishing detection for MCP Gateway

Project description

URL Reputation (Rust)

Author: Matheus Cafalchio Version: 0.1.0

Blocks URLs based on configured blocked domains, patterns and heuristics before resource fetch. Designed for fast and efficient resource checks.

Hooks

  • resource_pre_fetch – triggered before any resource is fetched.

Config

config:
    whitelist_domains: ["ibm.com", "yourdomain.com"]
    allowed_patterns: ["^https://trusted\\.internal/.*"]
    blocked_domains: ["malicious.example.com"]
    blocked_patterns: ["casino", "crypto"]
    use_heuristic_check: true
    entropy_threshold: 3.65
    block_non_secure_http: true

Config Description

  • whitelist_domains

    • A set of domains that are allowed to be fetched without any checks.
  • allowed_patterns

    • A list of regex patterns matched against the full URL. If any pattern matches, the URL is allowed and skips all remaining checks — including the non-secure HTTP check. Evaluated after the whitelist, before scheme enforcement.
  • blocked_domains

    • A set of domains that will always be blocked.
  • blocked_patterns

    • A list of regex patterns matched against the full URL. If any pattern matches, the URL is blocked.
  • use_heuristic_check

    • Whether heuristic checks (entropy, TLD validity, unicode security) should be performed. Default: false.
  • entropy_threshold

    • Maximum allowed Shannon entropy for a domain. Higher entropy may indicate suspicious/malicious domains.
  • block_non_secure_http

    • Whether URLs using http (non-secure) should be blocked. Default: true.

Architecture

flowchart LR
    Start([URL Input]) --> Parse{Parse & Extract Domain}
    Parse -->|Fail| Block1[❌ Parse Error]
    Parse -->|Success| DetectIP[Detect IP]

    DetectIP --> Whitelist{Whitelist?}
    Whitelist -->|Yes| Success[✅ Allow]
    Whitelist -->|No| AllowPat{Allowed Pattern?}

    AllowPat -->|Yes| Success
    AllowPat -->|No| HTTP{Scheme = HTTPS<br/>or not enforced?}

    HTTP -->|No| Block2[❌ Non-HTTPS]
    HTTP -->|Yes| BlockedDom{Blocked Domain<br/>or Pattern?}

    BlockedDom -->|Yes| Block3[❌ Blocked]
    BlockedDom -->|No| Heuristic{Heuristic Check<br/>Enabled & Not IP?}

    Heuristic -->|No| Success
    Heuristic -->|Yes| Checks{Pass Entropy,<br/>TLD & Unicode?}

    Checks -->|No| Block4[❌ Heuristic Fail]
    Checks -->|Yes| Success

    Block1 --> End([Return])
    Block2 --> End
    Block3 --> End
    Block4 --> End
    Success --> End

    style Start fill:#e1f5ff
    style End fill:#e1f5ff
    style Success fill:#c8e6c9
    style Block1 fill:#ffcdd2
    style Block2 fill:#ffcdd2
    style Block3 fill:#ffcdd2
    style Block4 fill:#ffcdd2

Logic workflow

  1. Parse & Normalize URL

    • Trim the input URL, then parse it (scheme and host are normalised to lowercase by the URL parser per RFC 3986; path and query retain original casing).
    • Fail → Violation: "Could not parse url".
  2. Extract Domain

    • Get the host string from the URL.
    • Fail → Violation: "Could not parse domain".
  3. Detect IP Address

    • Determine if domain is an IPv4 or IPv6 address.
    • Skip heuristic checks for IPs.
  4. Whitelist Check

    • If domain is in whitelist_domainscontinue_processing = true, skip all further checks.
  5. Allowed Patterns Check

    • If URL matches any regex in allowed_patternscontinue_processing = true, skip all further checks.
    • Note: this check runs before scheme enforcement, so an allowed_patterns match can bypass the non-secure HTTP block.
  6. Block Non-Secure HTTP

    • If scheme ≠ "https" and block_non_secure_httpViolation: "Blocked non secure http url".
  7. Blocked Domains

    • If domain is in blocked_domainsViolation: "Domain in blocked set".
  8. Blocked Patterns

    • If URL matches any regex in blocked_patternsViolation: "Blocked pattern".
  9. Heuristic Checks (only for non-IP domains and if use_heuristic_check = true): 9.1 High Entropy Check – If Shannon entropy > entropy_thresholdViolation: "High entropy domain". 9.2 TLD Validity Check – Validate top-level domain. Fail → Violation: "Illegal TLD". 9.3 Unicode Security Check – Validate domain unicode. Fail → Violation: "Domain unicode is not secure".

  10. Final Outcome

    • If no violations → continue_processing = true.
    • If any check fails → return first PluginViolation and continue_processing = false.

Limitations

- Static lists only; no external reputation providers.
- Ianna valid TLDs are static and will be out of date
- Ignores other schemes that are not http and https
- No external domain reputation checks

TODOs

- External threat-intel integration with cache – Query external feeds for known malicious domains.
- IP address handling policy – Decide rules for IPv4/IPv6 URLs.
- Dynamic TLD updates – Fetch latest IANA TLD list automatically.

Tests

Test Coverage (24 unit tests, all passing):

Filename Function Coverage Line Coverage Region Coverage
engine.rs 96.55% (28/29) 99.26% (533/537) 98.60% (634/643)
filters/heuristic.rs 100.00% (5/5) 96.49% (55/57) 97.53% (79/81)
filters/patterns.rs 100.00% (5/5) 100.00% (20/20) 100.00% (38/38)
lib.rs 0.00% (0/1) 0.00% (0/5) 0.00% (0/7)
types.rs 50.00% (3/6) 44.12% (15/34) 23.94% (17/71)
TOTAL 89.13% (41/46) 95.43% (627/657) 91.45% (770/842)

Note: lib.rs and types.rs contain PyO3 bindings and module declarations not covered by unit tests.

New test coverage includes:

  • Invalid regex pattern handling (both allowed and blocked patterns)
  • Case-insensitive domain matching (whitelist and blocklist)
  • Subdomain matching validation

Run tests:

cargo test --lib              # Run all unit tests
cargo llvm-cov --lib --html   # Generate coverage report

Heuristic methods

The heuristics were based on a research paper.

A. P. S. Bhadauria and M. Singh, "Domain‑Checker: A Classification of Malicious and Benign Domains Using Multitier Filtering," Springer Nature, 2023.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cpex_url_reputation-0.2.0.tar.gz (63.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cpex_url_reputation-0.2.0-cp311-abi3-win_amd64.whl (872.8 kB view details)

Uploaded CPython 3.11+Windows x86-64

cpex_url_reputation-0.2.0-cp311-abi3-manylinux_2_34_x86_64.whl (955.7 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ x86-64

cpex_url_reputation-0.2.0-cp311-abi3-manylinux_2_34_s390x.whl (1.0 MB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ s390x

cpex_url_reputation-0.2.0-cp311-abi3-manylinux_2_34_ppc64le.whl (988.8 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ ppc64le

cpex_url_reputation-0.2.0-cp311-abi3-manylinux_2_34_aarch64.whl (898.7 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ ARM64

cpex_url_reputation-0.2.0-cp311-abi3-macosx_11_0_arm64.whl (852.9 kB view details)

Uploaded CPython 3.11+macOS 11.0+ ARM64

File details

Details for the file cpex_url_reputation-0.2.0.tar.gz.

File metadata

  • Download URL: cpex_url_reputation-0.2.0.tar.gz
  • Upload date:
  • Size: 63.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cpex_url_reputation-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b0b07753dd9cf5c31b0611211f2c67f6b5be21a7202864c602868ace14d247f2
MD5 cc588d9738926408e3036afa610ec83b
BLAKE2b-256 f6fcb5e8e44734f953f58eb303a7c6470b8a104fad7de6eb7324a5bd21bd465a

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.2.0.tar.gz:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_url_reputation-0.2.0-cp311-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for cpex_url_reputation-0.2.0-cp311-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 69c1bce591c197ed3ce996ff4b24c4da82775c9087cab2fb3526c7130e6eb7e7
MD5 f54bf535d1a444cc9046593642ef04c1
BLAKE2b-256 0562d770be08148f482dcf3b4eb047f647704551b5f27fee9acce59941d078f5

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.2.0-cp311-abi3-win_amd64.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_url_reputation-0.2.0-cp311-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for cpex_url_reputation-0.2.0-cp311-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 e721e0d99bb5e95e725333fc4597812380ca63e8aca4eb2f063964fa0b3fefc2
MD5 25451df384d11635ab6b97d0acf5d273
BLAKE2b-256 8de00dfe8026584ef5ef679583a119a11f1fe8433026a2752665c8c78166a4dc

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.2.0-cp311-abi3-manylinux_2_34_x86_64.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_url_reputation-0.2.0-cp311-abi3-manylinux_2_34_s390x.whl.

File metadata

File hashes

Hashes for cpex_url_reputation-0.2.0-cp311-abi3-manylinux_2_34_s390x.whl
Algorithm Hash digest
SHA256 92a4c14c1afb754a2e54126777779c7c7e04eb8f7d48a41255690bd76a3ffc1c
MD5 17cb45e6257efcfc3eeb51ddfab71fe5
BLAKE2b-256 d998ffab241794ea092a4033ee2d7b8145ea8716eecf0529036e16503eb4b85b

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.2.0-cp311-abi3-manylinux_2_34_s390x.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_url_reputation-0.2.0-cp311-abi3-manylinux_2_34_ppc64le.whl.

File metadata

File hashes

Hashes for cpex_url_reputation-0.2.0-cp311-abi3-manylinux_2_34_ppc64le.whl
Algorithm Hash digest
SHA256 3c61770d96a317151b20870512fc41d8c9fe0719bdd4eb8985ed9fd33e585430
MD5 f49202a9e70869f14caa229747f533b3
BLAKE2b-256 5b683a5f2fd1f73dca8c81424eaaa675464c89667121446005d4438ba5e009dc

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.2.0-cp311-abi3-manylinux_2_34_ppc64le.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_url_reputation-0.2.0-cp311-abi3-manylinux_2_34_aarch64.whl.

File metadata

File hashes

Hashes for cpex_url_reputation-0.2.0-cp311-abi3-manylinux_2_34_aarch64.whl
Algorithm Hash digest
SHA256 62c6479076d9867e45291e12c3f93127d4adce0e2bc03bba4c8eb7230898dd4e
MD5 15a23ae51bb9e1e0e1c8beade8028a04
BLAKE2b-256 ba096ba1e001b710a588da0977e07944a00a90682606edae2edf63b0e48bc861

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.2.0-cp311-abi3-manylinux_2_34_aarch64.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_url_reputation-0.2.0-cp311-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cpex_url_reputation-0.2.0-cp311-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c9032fa14e94315bad39f5c3910f886a4a1ce262a8398aadd046bedfb0651525
MD5 d60e2ac7f2168b0c70a3aaa2f21b0c54
BLAKE2b-256 93be80d1e69b204683e721d1d5928e27668f6be80856633d83e1b85e4897b2d9

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_url_reputation-0.2.0-cp311-abi3-macosx_11_0_arm64.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page