Comprehensive DLP evasion test suite — scanner-agnostic, file-aware
Project description
evadex
A scanner-agnostic DLP evasion test suite. evadex generates hundreds of obfuscated variants of known-sensitive values and submits them to your DLP scanner to find what slips through — including through file extraction pipelines (DOCX, PDF, XLSX), not just plain-text API calls.
Built and tested with dlpscan; works with any scanner via its adapter interface. Detection rates vary by scanner, configuration, and ruleset — run evadex against your own deployment to see your results.
What it does
evadex takes a sensitive value (a credit card number, SSN, AWS key, etc.), runs it through every evasion technique it knows — unicode tricks, delimiter manipulation, encoding variants, regional digit scripts, homoglyphs, and more — and records which variants your scanner catches and which it misses.
Evasion categories:
| Generator | Techniques |
|---|---|
unicode_encoding |
Zero-width chars, fullwidth digits, homoglyphs, NFD/NFC/NFKC/NFKD normalization, HTML entities (decimal + hex), URL encoding (full, digits-only, mixed) |
delimiter |
Space, hyphen, dot, slash, tab, newline, mixed, doubled, none |
splitting |
Mid-value line break, HTML/CSS comment injection, prefix/suffix noise, JSON field split, whitespace padding, XML wrapping |
leetspeak |
Minimal, moderate, and aggressive substitution tiers |
regional_digits |
Arabic-Indic, Extended Arabic-Indic, Devanagari, Bengali, Thai, Myanmar, Khmer, Mongolian, NKo, Tibetan — plus mixed-script variants |
structural |
Left/right padding (spaces + zeros), noise embedding, partial values, case variation, repeated value |
encoding |
Base64 (standard, URL-safe, no-padding, MIME line-breaks, partial, double), ROT13, full/group reversal, double URL encoding, mixed NFD/NFC/NFKD normalization |
Submission strategies (for dlpscan-cli adapter):
Each variant is tested four ways by default: as plain text, embedded in a DOCX, embedded in a PDF, and embedded in an XLSX. This exercises your scanner's file extraction pipeline, not just its regex layer.
Built-in test payloads:
Payloads are classified as structured or heuristic — see Structured vs heuristic categories below.
| Label | Value | Category | Type |
|---|---|---|---|
| Visa 16-digit | 4532015112830366 |
credit_card |
structured |
| Amex 15-digit | 378282246310005 |
credit_card |
structured |
| Mastercard 16-digit | 5105105105105100 |
credit_card |
structured |
| Discover 16-digit | 6011111111111117 |
credit_card |
structured |
| JCB 16-digit | 3530111333300000 |
credit_card |
structured |
| UnionPay 16-digit | 6250941006528599 |
credit_card |
structured |
| Diners Club 14-digit | 30569309025904 |
credit_card |
structured |
| US SSN | 123-45-6789 |
ssn |
structured |
| Canada SIN | 046 454 286 |
sin |
structured |
| US Passport number | 340000136 |
us_passport |
structured |
| Australia TFN | 123 456 78 |
au_tfn |
structured |
| Germany Steuer-IdNr | 86095742719 |
de_tax_id |
structured |
| France INSEE (NIR) | 282097505604213 |
fr_insee |
structured |
| UK IBAN | GB82WEST12345698765432 |
iban |
structured |
| Germany IBAN | DE89370400440532013000 |
iban |
structured |
| France IBAN | FR7630006000011234567890189 |
iban |
structured |
| Spain IBAN | ES9121000418450200051332 |
iban |
structured |
| SWIFT/BIC code | DEUTDEDB |
swift_bic |
structured |
| ABA routing number | 021000021 |
aba_routing |
structured |
| Bitcoin legacy address | 1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVN2 |
bitcoin |
structured |
| Ethereum address | 0x742d35Cc6634C0532925a3b844Bc454e4438f44e |
ethereum |
structured |
| Email address | test.user@example.com |
email |
structured |
| US phone number | +1-555-867-5309 |
phone |
structured |
| AWS Access Key ID | AKIAIOSFODNN7EXAMPLE |
aws_key |
heuristic |
| GitHub classic token | ghp_16C7e42F292c6912E7710c838347Ae178B4a |
github_token |
heuristic |
| Stripe test secret key | sk_test_4eC39HqLyjWDarjtT7en6bh8Xy9mPqZ |
stripe_key |
heuristic |
| Slack bot token | xoxb-EXAMPLE-BOTTOKEN-abc123def |
slack_token |
heuristic |
| Sample JWT | (compact JWT string) | jwt |
heuristic |
| Top Secret classification label | TOP SECRET |
classification |
heuristic |
| HIPAA privacy label | HIPAA |
classification |
heuristic |
Heuristic payloads are excluded from the default scan. Use --include-heuristic to include them.
Structured vs heuristic categories
evadex classifies its built-in payload categories into two groups:
Structured — formats with well-defined, mathematically or syntactically validatable patterns. DLP scanners typically enforce these patterns precisely (e.g., Luhn check on credit cards, fixed-length digit groups for SSN/SIN, checksum-verified IBAN). Evasion results in this group reflect meaningful signal: a variant that evades detection is a real gap in coverage.
Categories: credit_card, ssn, sin, iban, swift_bic, aba_routing, bitcoin, ethereum, us_passport, au_tfn, de_tax_id, fr_insee, email, phone
Heuristic — formats where detection relies on fixed prefixes, high-entropy pattern matching, or loosely defined structure. DLP rules for these categories vary widely between scanners and configurations, and a "fail" result may simply reflect that the scanner never had a strong rule for that specific format variant — not that a real exfiltration path was found.
Categories: aws_key, jwt, github_token, stripe_key, slack_token, classification
Heuristic categories are excluded from the default scan to avoid misleading results. Include them with:
evadex scan --tool dlpscan-cli --include-heuristic
A warning is printed to stderr whenever --include-heuristic is active reminding you to interpret those results with caution.
Installation
Requires Python 3.10+.
pip install evadex
Or install from source:
git clone https://github.com/tbustenk/evadex
cd evadex
pip install -e ".[dev]"
Quick start
Run the full built-in suite against dlpscan (text strategy):
evadex scan --tool dlpscan-cli --strategy text
Test a single value:
evadex scan --tool dlpscan-cli --input "4532015112830366" --strategy text
Test with all file strategies (slower — exercises DOCX/PDF/XLSX extraction):
evadex scan --tool dlpscan-cli --input "4532015112830366"
Generate an HTML report:
evadex scan --tool dlpscan-cli --strategy text --format html -o report.html
Example output
Terminal summary
Running evadex scan against dlpscan-cli at http://localhost:8080...
Done. 590 tests — N detected, N evaded
Detection rates depend on your scanner, its version, and how it's configured.
JSON output (--format json, default)
{
"meta": {
"timestamp": "2026-04-01T22:01:36.172424+00:00",
"scanner": "rust-2.0.0",
"total": 590,
"pass": 142,
"fail": 448,
"error": 0,
"pass_rate": 24.1,
"summary_by_category": {
"credit_card": { "pass": 30, "fail": 90, "error": 0 },
"ssn": { "pass": 12, "fail": 60, "error": 0 },
"iban": { "pass": 10, "fail": 50, "error": 0 }
}
},
"results": [
{
"payload": {
"value": "5105105105105100",
"category": "credit_card",
"category_type": "structured",
"label": "Mastercard 16-digit"
},
"variant": {
"value": "5105105105105100",
"generator": "delimiter",
"technique": "no_delimiter",
"transform_name": "All delimiters removed",
"strategy": "text"
},
"detected": true,
"severity": "pass",
"duration_ms": 371.01,
"error": null,
"raw_response": { "detected": true }
},
{
"payload": {
"value": "046 454 286",
"category": "sin",
"category_type": "structured",
"label": "Canada SIN"
},
"variant": {
"value": "Ο4б 4Ƽ4 ΚȢб",
"generator": "unicode_encoding",
"technique": "homoglyph_substitution",
"transform_name": "Visually similar Cyrillic/Greek characters substituted",
"strategy": "text"
},
"detected": false,
"severity": "fail",
"duration_ms": 378.57,
"error": null,
"raw_response": { "detected": false }
}
]
}
Severity values:
| Value | Meaning |
|---|---|
pass |
Scanner detected the variant (good) |
fail |
Scanner missed the variant — evasion succeeded |
error |
Adapter error (network, timeout, malformed scanner response, etc.) |
CLI reference
evadex scan [OPTIONS]
| Flag | Default | Description |
|---|---|---|
--tool, -t |
dlpscan-cli |
Adapter name to use |
--input, -i |
(all built-ins) | Single value to test. If omitted, runs all 23 structured built-in payloads (add --include-heuristic for all 30). Category is auto-detected (Luhn check, regex patterns for SSN/IBAN/AWS/JWT/email/phone). |
--format, -f |
json |
Output format: json or html |
--output, -o |
stdout | Write report to file instead of stdout |
--strategy |
all four | Submission strategy: text, docx, pdf, xlsx. Repeat the flag for multiple. Omit to run all four. |
--concurrency |
5 |
Max concurrent requests |
--timeout |
30.0 |
Request timeout in seconds |
--url |
http://localhost:8080 |
Base URL (for HTTP-based adapters) |
--api-key |
(env: EVADEX_API_KEY) |
API key passed as Authorization: Bearer. Use the environment variable in preference to the CLI flag to avoid exposure in shell history and process listings. |
--category |
(all structured) | Filter built-in payloads by category. Repeat for multiple. Values: credit_card, ssn, sin, iban, swift_bic, aba_routing, bitcoin, ethereum, us_passport, au_tfn, de_tax_id, fr_insee, email, phone, aws_key, jwt, github_token, stripe_key, slack_token, classification |
--variant-group |
(all) | Limit to specific generator(s). Repeat for multiple. Values: unicode_encoding, delimiter, splitting, leetspeak, regional_digits, structural, encoding |
--include-heuristic |
off | Also run heuristic categories (aws_key, jwt, github_token, stripe_key, slack_token, classification). A warning is printed when enabled — see Structured vs heuristic categories. |
--scanner-label |
(empty) | Label recorded in the JSON meta.scanner field. Use to tag a specific scanner version, e.g. python-1.3.0 or rust-2.0.0. Useful when comparing results across scanner builds. |
--exe |
dlpscan |
Path to the scanner executable (dlpscan-cli adapter only). Use when dlpscan is not on PATH or you need to target a specific build. |
--cmd-style |
python |
Command format for dlpscan-cli: python (invokes dlpscan -f json <file>) or rust (invokes dlpscan --format json scan <file>). |
Examples
# Only test credit card payloads
evadex scan --tool dlpscan-cli --strategy text --category credit_card
# Only run unicode evasion techniques
evadex scan --tool dlpscan-cli --strategy text --variant-group unicode_encoding
# Only run unicode + delimiter techniques on SSN and IBAN
evadex scan --tool dlpscan-cli --strategy text \
--category ssn --category iban \
--variant-group unicode_encoding --variant-group delimiter
# Test a custom value (category auto-detected)
evadex scan --tool dlpscan-cli --input "AKIAIOSFODNN7EXAMPLE" --strategy text
# File strategy only — test DOCX extraction pipeline
evadex scan --tool dlpscan-cli --input "4532015112830366" --strategy docx
# Save HTML report
evadex scan --tool dlpscan-cli --strategy text --format html -o report.html
# Target a specific scanner binary, tag the output
evadex scan --tool dlpscan-cli --exe /opt/dlpscan/dlpscan --cmd-style rust \
--scanner-label "rust-2.0.0" --format json -o rust_results.json
Adapters
Built-in: dlpscan-cli
Invokes the dlpscan CLI directly as a subprocess. evadex was built and tested with dlpscan as the reference scanner. Requires dlpscan to be installed and on PATH (or provide --exe).
evadex scan --tool dlpscan-cli
For file strategies, evadex builds the document in memory and writes it to a temp file, runs the scanner against it, then immediately deletes the temp file. No persistent disk footprint from test data. File extraction support in dlpscan requires pip install dlpscan[office].
Built-in: dlpscan
Generic HTTP adapter for any DLP tool that exposes a REST API. Sends plain text to POST /scan with a {"content": "..."} body, and file uploads to POST /scan/file as multipart form data. Expects a JSON response with a detected boolean (configurable via the response_detected_key extra config option).
evadex scan --tool dlpscan --url http://my-dlpscan-server:8080 --api-key my-key
Adding a custom adapter
-
Create a file anywhere in your project, e.g.
my_adapter.py. -
Subclass
BaseAdapterand implementsubmit():
from evadex.adapters.base import BaseAdapter
from evadex.core.registry import register_adapter
from evadex.core.result import Payload, Variant, ScanResult
@register_adapter("my-tool")
class MyToolAdapter(BaseAdapter):
name = "my-tool"
async def submit(self, payload: Payload, variant: Variant) -> ScanResult:
# Send variant.value to your scanner however it expects it.
# variant.strategy is "text", "docx", "pdf", or "xlsx".
# Return a ScanResult with detected=True/False.
response = await call_my_scanner(variant.value)
detected = response.get("found", False)
return ScanResult(
payload=payload,
variant=variant,
detected=detected,
raw_response=response,
)
- Import your adapter before invoking evadex (so the
@register_adapterdecorator fires), then use it:
python -c "import my_adapter" && evadex scan --tool my-tool
Or wire it up properly as a package with an entry point in pyproject.toml:
[project.entry-points."evadex.adapters"]
my-tool = "my_package.my_adapter"
Optional hooks:
async def setup(self):
# Called once before the batch — open connections, authenticate, etc.
self._session = await open_session()
async def teardown(self):
# Called once after the batch — clean up connections.
await self._session.close()
async def health_check(self) -> bool:
# Optional — verify the scanner is reachable.
return await ping_scanner()
File strategies: variant.strategy tells you which format evadex wants to use. If your scanner only supports one method, handle what you need:
from evadex.adapters.dlpscan.file_builder import FileBuilder
async def submit(self, payload, variant):
if variant.strategy == "text":
raw = await self._scan_text(variant.value)
else:
data, mime = FileBuilder.build(variant.value, variant.strategy)
raw = await self._scan_file(data, mime)
...
FileBuilder.build(text, fmt) returns (bytes, mime_type) entirely in memory — no disk writes.
Output schema
Top-level
{
"meta": { ... },
"results": [ ... ]
}
meta
| Field | Type | Description |
|---|---|---|
timestamp |
ISO 8601 string | When the scan ran (UTC) |
scanner |
string | Scanner label from --scanner-label (empty string if not set) |
total |
int | Total test cases run |
pass |
int | Variants detected by scanner |
fail |
int | Variants that evaded scanner |
error |
int | Adapter errors |
pass_rate |
float | pass / total * 100, rounded to one decimal |
summary_by_category |
object | Per-category pass/fail/error counts, sorted alphabetically by category name |
results[]
| Field | Type | Description |
|---|---|---|
payload.value |
string | Original sensitive value |
payload.category |
string | Detected category enum value |
payload.category_type |
string | structured or heuristic — see Structured vs heuristic categories |
payload.label |
string | Human-readable label |
variant.value |
string | Transformed/obfuscated value submitted to scanner |
variant.generator |
string | Which generator produced this variant |
variant.technique |
string | Machine-readable technique name |
variant.transform_name |
string | Human-readable description of the transform |
variant.strategy |
string | Submission strategy: text, docx, pdf, xlsx |
detected |
bool | Whether the scanner flagged this variant. false for error results — check severity to distinguish |
severity |
string | pass (detected), fail (not detected), or error (adapter error) |
duration_ms |
float | Time for this test case in milliseconds |
error |
string | null | Error message if adapter threw; null otherwise |
raw_response |
object | Raw parsed response from the adapter. For dlpscan-cli this is {"matches": [...]}. May contain match objects that include the variant value — treat the output file accordingly. |
Security notes
- API keys: Prefer the
EVADEX_API_KEYenvironment variable over the--api-keyCLI flag. Command-line arguments are visible in process listings (ps aux) and may be saved in shell history. - Output files: The JSON report's
raw_responsefields may contain scanner match objects that echo variant values (transformed versions of sensitive test data). Apply appropriate access controls to report files. - Temp files: The
dlpscan-cliadapter writes each test variant to a temp file for subprocess invocation and deletes it immediately after the scan. No persistent disk footprint from test data. - Network isolation: Run evadex and the scanner on an isolated test network. Test variant values are obfuscated but structurally derived from real sensitive patterns.
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file evadex-0.2.0.tar.gz.
File metadata
- Download URL: evadex-0.2.0.tar.gz
- Upload date:
- Size: 37.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
082a107e52abf9143086a73e3783052130a7f31c1e7d793da7f38ba501705dbe
|
|
| MD5 |
499b72aec9bd14a290e886bbc7cd99f3
|
|
| BLAKE2b-256 |
ef4a600ba8ff2c5606ae4e9727389906d5412402a9792e1c48996f1d360e756f
|
File details
Details for the file evadex-0.2.0-py3-none-any.whl.
File metadata
- Download URL: evadex-0.2.0-py3-none-any.whl
- Upload date:
- Size: 37.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e56707c4d7e34f1ed655eab3318099d949a9debc48dcf3217f1843025b85bb7b
|
|
| MD5 |
af957d0e929d7aea41acf6d5391ee653
|
|
| BLAKE2b-256 |
534bcf6fa95cddf2d484e9959394770eb2c2e1e6b3420bd6ec1fceefb6c0d915
|