High-performance URL reputation and phishing detection for MCP Gateway
Project description
URL Reputation (Rust)
Author: Matheus Cafalchio Version: 0.1.0
Blocks URLs based on configured blocked domains, patterns and heuristics before resource fetch. Designed for fast and efficient resource checks.
Hooks
- resource_pre_fetch – triggered before any resource is fetched.
Config
config:
whitelist_domains: ["ibm.com", "yourdomain.com"]
allowed_patterns: ["^https://trusted\\.internal/.*"]
blocked_domains: ["malicious.example.com"]
blocked_patterns: ["casino", "crypto"]
use_heuristic_check: true
entropy_threshold: 3.65
block_non_secure_http: true
Config Description
-
whitelist_domains
- A set of domains that are allowed to be fetched without any checks.
-
allowed_patterns
- A list of regex patterns matched against the full URL. If any pattern matches, the URL is allowed and skips all remaining checks — including the non-secure HTTP check. Evaluated after the whitelist, before scheme enforcement.
-
blocked_domains
- A set of domains that will always be blocked.
-
blocked_patterns
- A list of regex patterns matched against the full URL. If any pattern matches, the URL is blocked.
-
use_heuristic_check
- Whether heuristic checks (entropy, TLD validity, unicode security) should be performed. Default:
false.
- Whether heuristic checks (entropy, TLD validity, unicode security) should be performed. Default:
-
entropy_threshold
- Maximum allowed Shannon entropy for a domain. Higher entropy may indicate suspicious/malicious domains.
-
block_non_secure_http
- Whether URLs using
http(non-secure) should be blocked. Default:true.
- Whether URLs using
Architecture
flowchart LR
Start([URL Input]) --> Parse{Parse & Extract Domain}
Parse -->|Fail| Block1[❌ Parse Error]
Parse -->|Success| DetectIP[Detect IP]
DetectIP --> Whitelist{Whitelist?}
Whitelist -->|Yes| Success[✅ Allow]
Whitelist -->|No| AllowPat{Allowed Pattern?}
AllowPat -->|Yes| Success
AllowPat -->|No| HTTP{Scheme = HTTPS<br/>or not enforced?}
HTTP -->|No| Block2[❌ Non-HTTPS]
HTTP -->|Yes| BlockedDom{Blocked Domain<br/>or Pattern?}
BlockedDom -->|Yes| Block3[❌ Blocked]
BlockedDom -->|No| Heuristic{Heuristic Check<br/>Enabled & Not IP?}
Heuristic -->|No| Success
Heuristic -->|Yes| Checks{Pass Entropy,<br/>TLD & Unicode?}
Checks -->|No| Block4[❌ Heuristic Fail]
Checks -->|Yes| Success
Block1 --> End([Return])
Block2 --> End
Block3 --> End
Block4 --> End
Success --> End
style Start fill:#e1f5ff
style End fill:#e1f5ff
style Success fill:#c8e6c9
style Block1 fill:#ffcdd2
style Block2 fill:#ffcdd2
style Block3 fill:#ffcdd2
style Block4 fill:#ffcdd2
Logic workflow
-
Parse & Normalize URL
- Trim the input URL, then parse it (scheme and host are normalised to lowercase by the URL parser per RFC 3986; path and query retain original casing).
- Fail → Violation:
"Could not parse url".
-
Extract Domain
- Get the host string from the URL.
- Fail → Violation:
"Could not parse domain".
-
Detect IP Address
- Determine if domain is an IPv4 or IPv6 address.
- Skip heuristic checks for IPs.
-
Whitelist Check
- If domain is in
whitelist_domains→ continue_processing = true, skip all further checks.
- If domain is in
-
Allowed Patterns Check
- If URL matches any regex in
allowed_patterns→ continue_processing = true, skip all further checks. - Note: this check runs before scheme enforcement, so an
allowed_patternsmatch can bypass the non-secure HTTP block.
- If URL matches any regex in
-
Block Non-Secure HTTP
- If scheme ≠
"https"andblock_non_secure_http→ Violation:"Blocked non secure http url".
- If scheme ≠
-
Blocked Domains
- If domain is in
blocked_domains→ Violation:"Domain in blocked set".
- If domain is in
-
Blocked Patterns
- If URL matches any regex in
blocked_patterns→ Violation:"Blocked pattern".
- If URL matches any regex in
-
Heuristic Checks (only for non-IP domains and if
use_heuristic_check = true): 9.1 High Entropy Check – If Shannon entropy >entropy_threshold→ Violation:"High entropy domain". 9.2 TLD Validity Check – Validate top-level domain. Fail → Violation:"Illegal TLD". 9.3 Unicode Security Check – Validate domain unicode. Fail → Violation:"Domain unicode is not secure". -
Final Outcome
- If no violations → continue_processing = true.
- If any check fails → return first
PluginViolationand continue_processing = false.
Limitations
- Static lists only; no external reputation providers.
- Ianna valid TLDs are static and will be out of date
- Ignores other schemes that are not http and https
- No external domain reputation checks
TODOs
- External threat-intel integration with cache – Query external feeds for known malicious domains.
- IP address handling policy – Decide rules for IPv4/IPv6 URLs.
- Dynamic TLD updates – Fetch latest IANA TLD list automatically.
Tests
Test Coverage (24 unit tests, all passing):
| Filename | Function Coverage | Line Coverage | Region Coverage |
|---|---|---|---|
| engine.rs | 96.55% (28/29) | 99.26% (533/537) | 98.60% (634/643) |
| filters/heuristic.rs | 100.00% (5/5) | 96.49% (55/57) | 97.53% (79/81) |
| filters/patterns.rs | 100.00% (5/5) | 100.00% (20/20) | 100.00% (38/38) |
| lib.rs | 0.00% (0/1) | 0.00% (0/5) | 0.00% (0/7) |
| types.rs | 50.00% (3/6) | 44.12% (15/34) | 23.94% (17/71) |
| TOTAL | 89.13% (41/46) | 95.43% (627/657) | 91.45% (770/842) |
Note: lib.rs and types.rs contain PyO3 bindings and module declarations not covered by unit tests.
New test coverage includes:
- Invalid regex pattern handling (both allowed and blocked patterns)
- Case-insensitive domain matching (whitelist and blocklist)
- Subdomain matching validation
Run tests:
cargo test --lib # Run all unit tests
cargo llvm-cov --lib --html # Generate coverage report
Heuristic methods
The heuristics were based on a research paper.
A. P. S. Bhadauria and M. Singh, "Domain‑Checker: A Classification of Malicious and Benign Domains Using Multitier Filtering," Springer Nature, 2023.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cpex_url_reputation-0.2.0.tar.gz.
File metadata
- Download URL: cpex_url_reputation-0.2.0.tar.gz
- Upload date:
- Size: 63.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b0b07753dd9cf5c31b0611211f2c67f6b5be21a7202864c602868ace14d247f2
|
|
| MD5 |
cc588d9738926408e3036afa610ec83b
|
|
| BLAKE2b-256 |
f6fcb5e8e44734f953f58eb303a7c6470b8a104fad7de6eb7324a5bd21bd465a
|
Provenance
The following attestation bundles were made for cpex_url_reputation-0.2.0.tar.gz:
Publisher:
release-rust-python-package.yaml on IBM/cpex-plugins
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cpex_url_reputation-0.2.0.tar.gz -
Subject digest:
b0b07753dd9cf5c31b0611211f2c67f6b5be21a7202864c602868ace14d247f2 - Sigstore transparency entry: 1342585457
- Sigstore integration time:
-
Permalink:
IBM/cpex-plugins@79e59ee7e6aa5b6914dcddd7590f8d3bde2eeaad -
Branch / Tag:
refs/tags/url-reputation-v0.2.0 - Owner: https://github.com/IBM
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-rust-python-package.yaml@79e59ee7e6aa5b6914dcddd7590f8d3bde2eeaad -
Trigger Event:
push
-
Statement type:
File details
Details for the file cpex_url_reputation-0.2.0-cp311-abi3-win_amd64.whl.
File metadata
- Download URL: cpex_url_reputation-0.2.0-cp311-abi3-win_amd64.whl
- Upload date:
- Size: 872.8 kB
- Tags: CPython 3.11+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
69c1bce591c197ed3ce996ff4b24c4da82775c9087cab2fb3526c7130e6eb7e7
|
|
| MD5 |
f54bf535d1a444cc9046593642ef04c1
|
|
| BLAKE2b-256 |
0562d770be08148f482dcf3b4eb047f647704551b5f27fee9acce59941d078f5
|
Provenance
The following attestation bundles were made for cpex_url_reputation-0.2.0-cp311-abi3-win_amd64.whl:
Publisher:
release-rust-python-package.yaml on IBM/cpex-plugins
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cpex_url_reputation-0.2.0-cp311-abi3-win_amd64.whl -
Subject digest:
69c1bce591c197ed3ce996ff4b24c4da82775c9087cab2fb3526c7130e6eb7e7 - Sigstore transparency entry: 1342585468
- Sigstore integration time:
-
Permalink:
IBM/cpex-plugins@79e59ee7e6aa5b6914dcddd7590f8d3bde2eeaad -
Branch / Tag:
refs/tags/url-reputation-v0.2.0 - Owner: https://github.com/IBM
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-rust-python-package.yaml@79e59ee7e6aa5b6914dcddd7590f8d3bde2eeaad -
Trigger Event:
push
-
Statement type:
File details
Details for the file cpex_url_reputation-0.2.0-cp311-abi3-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: cpex_url_reputation-0.2.0-cp311-abi3-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 955.7 kB
- Tags: CPython 3.11+, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e721e0d99bb5e95e725333fc4597812380ca63e8aca4eb2f063964fa0b3fefc2
|
|
| MD5 |
25451df384d11635ab6b97d0acf5d273
|
|
| BLAKE2b-256 |
8de00dfe8026584ef5ef679583a119a11f1fe8433026a2752665c8c78166a4dc
|
Provenance
The following attestation bundles were made for cpex_url_reputation-0.2.0-cp311-abi3-manylinux_2_34_x86_64.whl:
Publisher:
release-rust-python-package.yaml on IBM/cpex-plugins
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cpex_url_reputation-0.2.0-cp311-abi3-manylinux_2_34_x86_64.whl -
Subject digest:
e721e0d99bb5e95e725333fc4597812380ca63e8aca4eb2f063964fa0b3fefc2 - Sigstore transparency entry: 1342585485
- Sigstore integration time:
-
Permalink:
IBM/cpex-plugins@79e59ee7e6aa5b6914dcddd7590f8d3bde2eeaad -
Branch / Tag:
refs/tags/url-reputation-v0.2.0 - Owner: https://github.com/IBM
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-rust-python-package.yaml@79e59ee7e6aa5b6914dcddd7590f8d3bde2eeaad -
Trigger Event:
push
-
Statement type:
File details
Details for the file cpex_url_reputation-0.2.0-cp311-abi3-manylinux_2_34_s390x.whl.
File metadata
- Download URL: cpex_url_reputation-0.2.0-cp311-abi3-manylinux_2_34_s390x.whl
- Upload date:
- Size: 1.0 MB
- Tags: CPython 3.11+, manylinux: glibc 2.34+ s390x
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
92a4c14c1afb754a2e54126777779c7c7e04eb8f7d48a41255690bd76a3ffc1c
|
|
| MD5 |
17cb45e6257efcfc3eeb51ddfab71fe5
|
|
| BLAKE2b-256 |
d998ffab241794ea092a4033ee2d7b8145ea8716eecf0529036e16503eb4b85b
|
Provenance
The following attestation bundles were made for cpex_url_reputation-0.2.0-cp311-abi3-manylinux_2_34_s390x.whl:
Publisher:
release-rust-python-package.yaml on IBM/cpex-plugins
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cpex_url_reputation-0.2.0-cp311-abi3-manylinux_2_34_s390x.whl -
Subject digest:
92a4c14c1afb754a2e54126777779c7c7e04eb8f7d48a41255690bd76a3ffc1c - Sigstore transparency entry: 1342585474
- Sigstore integration time:
-
Permalink:
IBM/cpex-plugins@79e59ee7e6aa5b6914dcddd7590f8d3bde2eeaad -
Branch / Tag:
refs/tags/url-reputation-v0.2.0 - Owner: https://github.com/IBM
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-rust-python-package.yaml@79e59ee7e6aa5b6914dcddd7590f8d3bde2eeaad -
Trigger Event:
push
-
Statement type:
File details
Details for the file cpex_url_reputation-0.2.0-cp311-abi3-manylinux_2_34_ppc64le.whl.
File metadata
- Download URL: cpex_url_reputation-0.2.0-cp311-abi3-manylinux_2_34_ppc64le.whl
- Upload date:
- Size: 988.8 kB
- Tags: CPython 3.11+, manylinux: glibc 2.34+ ppc64le
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c61770d96a317151b20870512fc41d8c9fe0719bdd4eb8985ed9fd33e585430
|
|
| MD5 |
f49202a9e70869f14caa229747f533b3
|
|
| BLAKE2b-256 |
5b683a5f2fd1f73dca8c81424eaaa675464c89667121446005d4438ba5e009dc
|
Provenance
The following attestation bundles were made for cpex_url_reputation-0.2.0-cp311-abi3-manylinux_2_34_ppc64le.whl:
Publisher:
release-rust-python-package.yaml on IBM/cpex-plugins
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cpex_url_reputation-0.2.0-cp311-abi3-manylinux_2_34_ppc64le.whl -
Subject digest:
3c61770d96a317151b20870512fc41d8c9fe0719bdd4eb8985ed9fd33e585430 - Sigstore transparency entry: 1342585487
- Sigstore integration time:
-
Permalink:
IBM/cpex-plugins@79e59ee7e6aa5b6914dcddd7590f8d3bde2eeaad -
Branch / Tag:
refs/tags/url-reputation-v0.2.0 - Owner: https://github.com/IBM
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-rust-python-package.yaml@79e59ee7e6aa5b6914dcddd7590f8d3bde2eeaad -
Trigger Event:
push
-
Statement type:
File details
Details for the file cpex_url_reputation-0.2.0-cp311-abi3-manylinux_2_34_aarch64.whl.
File metadata
- Download URL: cpex_url_reputation-0.2.0-cp311-abi3-manylinux_2_34_aarch64.whl
- Upload date:
- Size: 898.7 kB
- Tags: CPython 3.11+, manylinux: glibc 2.34+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
62c6479076d9867e45291e12c3f93127d4adce0e2bc03bba4c8eb7230898dd4e
|
|
| MD5 |
15a23ae51bb9e1e0e1c8beade8028a04
|
|
| BLAKE2b-256 |
ba096ba1e001b710a588da0977e07944a00a90682606edae2edf63b0e48bc861
|
Provenance
The following attestation bundles were made for cpex_url_reputation-0.2.0-cp311-abi3-manylinux_2_34_aarch64.whl:
Publisher:
release-rust-python-package.yaml on IBM/cpex-plugins
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cpex_url_reputation-0.2.0-cp311-abi3-manylinux_2_34_aarch64.whl -
Subject digest:
62c6479076d9867e45291e12c3f93127d4adce0e2bc03bba4c8eb7230898dd4e - Sigstore transparency entry: 1342585492
- Sigstore integration time:
-
Permalink:
IBM/cpex-plugins@79e59ee7e6aa5b6914dcddd7590f8d3bde2eeaad -
Branch / Tag:
refs/tags/url-reputation-v0.2.0 - Owner: https://github.com/IBM
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-rust-python-package.yaml@79e59ee7e6aa5b6914dcddd7590f8d3bde2eeaad -
Trigger Event:
push
-
Statement type:
File details
Details for the file cpex_url_reputation-0.2.0-cp311-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: cpex_url_reputation-0.2.0-cp311-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 852.9 kB
- Tags: CPython 3.11+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c9032fa14e94315bad39f5c3910f886a4a1ce262a8398aadd046bedfb0651525
|
|
| MD5 |
d60e2ac7f2168b0c70a3aaa2f21b0c54
|
|
| BLAKE2b-256 |
93be80d1e69b204683e721d1d5928e27668f6be80856633d83e1b85e4897b2d9
|
Provenance
The following attestation bundles were made for cpex_url_reputation-0.2.0-cp311-abi3-macosx_11_0_arm64.whl:
Publisher:
release-rust-python-package.yaml on IBM/cpex-plugins
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cpex_url_reputation-0.2.0-cp311-abi3-macosx_11_0_arm64.whl -
Subject digest:
c9032fa14e94315bad39f5c3910f886a4a1ce262a8398aadd046bedfb0651525 - Sigstore transparency entry: 1342585479
- Sigstore integration time:
-
Permalink:
IBM/cpex-plugins@79e59ee7e6aa5b6914dcddd7590f8d3bde2eeaad -
Branch / Tag:
refs/tags/url-reputation-v0.2.0 - Owner: https://github.com/IBM
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-rust-python-package.yaml@79e59ee7e6aa5b6914dcddd7590f8d3bde2eeaad -
Trigger Event:
push
-
Statement type: