Comprehensive DLP evasion test suite — scanner-agnostic, file-aware
Project description
evadex
A scanner-agnostic DLP evasion test suite. evadex generates hundreds of obfuscated variants of known-sensitive values and submits them to your DLP scanner to find what slips through — including through file extraction pipelines (DOCX, PDF, XLSX), not just plain-text API calls.
Built and tested with dlpscan; works with any scanner via its adapter interface. Detection rates vary by scanner, configuration, and ruleset — run evadex against your own deployment to see your results.
What it does
evadex takes a sensitive value (a credit card number, SSN, AWS key, etc.), runs it through every evasion technique it knows — unicode tricks, delimiter manipulation, encoding variants, regional digit scripts, homoglyphs, and more — and records which variants your scanner catches and which it misses.
Evasion categories:
| Generator | Techniques |
|---|---|
unicode_encoding |
Zero-width chars, fullwidth digits, homoglyphs, NFD/NFC/NFKC/NFKD normalization, HTML entities (decimal + hex), URL encoding (full, digits-only, mixed) |
delimiter |
Space, hyphen, dot, slash, tab, newline, mixed, doubled, none |
splitting |
Mid-value line break, HTML/CSS comment injection, prefix/suffix noise, JSON field split, whitespace padding, XML wrapping |
leetspeak |
Minimal, moderate, and aggressive substitution tiers |
regional_digits |
Arabic-Indic, Extended Arabic-Indic, Devanagari, Bengali, Thai, Myanmar, Khmer, Mongolian, NKo, Tibetan — plus mixed-script variants |
structural |
Left/right padding (spaces + zeros), noise embedding, partial values, case variation, repeated value |
encoding |
Base64 (standard, URL-safe, no-padding, MIME line-breaks, partial, double), ROT13, full/group reversal, double URL encoding, mixed NFD/NFC/NFKD normalization |
context_injection |
Value wrapped in email body, JSON record, XML element, CSV row, SQL snippet, and more |
unicode_whitespace |
Spaces replaced with NBSP, en-space, em-space, or a mixed pattern |
bidirectional |
Unicode bidirectional control characters (RLO, LRO, RLE, RLI, ALM) injected around or within the value |
soft_hyphen |
Soft hyphen (U+00AD) and word joiner (U+2060) inserted at group boundaries or between every character |
morse_code |
Digits encoded as International Morse Code — space-separated, slash-separated, concatenated, or newline-separated; applies to credit_card, ssn, sin, iban, phone, and related numeric categories |
encoding_chains |
Chained multi-step encodings: base64(rot13), base64(hex), hex(base64), rot13(base64), url(base64), base64(base64), and the triple chain base64(rot13(hex)) — defeats scanners that only decode one layer |
Submission strategies (for dlpscan-cli adapter):
Each variant is tested four ways by default: as plain text, embedded in a DOCX, embedded in a PDF, and embedded in an XLSX. This exercises your scanner's file extraction pipeline, not just its regex layer.
Built-in test payloads:
Payloads are classified as structured or heuristic — see Structured vs heuristic categories below.
554 payloads across 489 categories covering 489/557 sub-patterns (88%) of the dlpscan-rs pattern library, with 421 structured categories confirmed detected by seed scan. See Coverage for a breakdown by sub-pattern.
North America
| Label | Value | Category | Type |
|---|---|---|---|
| Visa 16-digit | 4532015112830366 |
credit_card |
structured |
| Amex 15-digit | 378282246310005 |
credit_card |
structured |
| Mastercard 16-digit | 5105105105105100 |
credit_card |
structured |
| Discover 16-digit | 6011111111111117 |
credit_card |
structured |
| JCB 16-digit | 3530111333300000 |
credit_card |
structured |
| UnionPay 16-digit | 6250941006528599 |
credit_card |
structured |
| Diners Club 14-digit | 30569309025904 |
credit_card |
structured |
| US SSN | 123-45-6789 |
ssn |
structured |
| US ITIN | 912-34-5678 |
us_itin |
structured |
| US EIN | 12-3456789 |
us_ein |
structured |
| US Medicare Beneficiary ID | 1EG4-TE5-MK72 |
us_mbi |
structured |
| US Passport | 340000136 |
us_passport |
structured |
| US state driver's licences (51) | one per state + DC | us_dl |
structured |
| Canada SIN | 046 454 286 |
sin |
structured |
| Canadian passport | AB123456 |
ca_passport |
structured |
| Quebec RAMQ health card | BOUD 1234 5678 |
ca_ramq |
structured |
| Ontario health card | 1234-567-890-AB |
ca_ontario_health |
structured |
| BC CareCard | 9123456789 |
ca_bc_carecard |
structured |
| Alberta health card | 123456789 |
ca_ab_health |
structured |
| Manitoba health card | 987654321 |
ca_mb_health |
structured |
| Saskatchewan health card | 234567890 |
ca_sk_health |
structured |
| Nova Scotia health card | 1234 567 890 |
ca_ns_health |
structured |
| New Brunswick health card | 1234567890 |
ca_nb_health |
structured |
| PEI health card | 123456789012 |
ca_pei_health |
structured |
| Newfoundland health card | 9876543210 |
ca_nl_health |
structured |
| Quebec driver's licence | B123456789012 |
ca_qc_drivers |
structured |
| Ontario driver's licence | A1234-56789-01234 |
ca_on_drivers |
structured |
| BC driver's licence | 1234567 |
ca_bc_drivers |
structured |
| Manitoba driver's licence | AB-123-456-789 |
ca_mb_drivers |
structured |
| Saskatchewan driver's licence | 12345678 |
ca_sk_drivers |
structured |
| Nova Scotia driver's licence | AB1234567 |
ca_ns_drivers |
structured |
| New Brunswick driver's licence | 1234567 |
ca_nb_drivers |
structured |
| PEI driver's licence | 123456 |
ca_pei_drivers |
structured |
| Newfoundland driver's licence | A123456789 |
ca_nl_drivers |
structured |
| Canadian Business Number | 111222333 |
ca_business_number |
structured |
| Canadian GST/HST registration | 111222333RT0001 |
ca_gst_hst |
structured |
| Canadian transit/routing number | 12345-678 |
ca_transit_number |
structured |
| Canadian bank account | 12345678 |
ca_bank_account |
structured |
| Mexico CURP | BADD110313HCMLNS09 |
mx_curp |
structured |
Europe
| Label | Value | Category | Type |
|---|---|---|---|
| UK IBAN | GB82WEST12345698765432 |
iban |
structured |
| Germany IBAN | DE89370400440532013000 |
iban |
structured |
| France IBAN | FR7630006000011234567890189 |
iban |
structured |
| Spain IBAN | ES9121000418450200051332 |
iban |
structured |
| SWIFT/BIC code | DEUTDEDB |
swift_bic |
structured |
| ABA routing number | 021000021 |
aba_routing |
structured |
| UK National Insurance Number | AB123456C |
uk_nin |
structured |
| UK driving licence | MORGA753116SM9IJ |
uk_dl |
structured |
| German Personalausweis | L01X00T47 |
de_id |
structured |
| Germany Steuer-IdNr | 86095742719 |
de_tax_id |
structured |
| French CNI | 880692310285 |
fr_cni |
structured |
| France INSEE (NIR) | 282097505604213 |
fr_insee |
structured |
| Spanish DNI | 12345678Z |
es_dni |
structured |
| Italian Codice Fiscale | RSSMRA85T10A562S |
it_cf |
structured |
| Dutch BSN | 111222333 |
nl_bsn |
structured |
| Swedish Personnummer | 811228-9874 |
se_pin |
structured |
| Norwegian Fødselsnummer | 01010112345 |
no_fnr |
structured |
| Finnish Henkilötunnus | 131052-308T |
fi_hetu |
structured |
| Polish PESEL | 44051401458 |
pl_pesel |
structured |
| Swiss AHV | 756.1234.5678.97 |
ch_ahv |
structured |
| Austria social insurance | 1234-010150 |
at_svn |
structured |
| Belgium National Register Number | 85.01.01-234.56 |
be_nrn |
structured |
| Bulgaria EGN | 8501010001 |
bg_egn |
structured |
| Croatia OIB | 12345678901 |
hr_oib |
structured |
| Cyprus tax ID | 12345678A |
cy_tin |
structured |
| Czech birth number | 850101/1234 |
cz_rc |
structured |
| Denmark CPR | 010185-1234 |
dk_cpr |
structured |
| Estonia personal code | 38501010002 |
ee_ik |
structured |
| EU VAT number | DE123456789 |
eu_vat |
structured |
| Greece AMKA | 01018512345 |
gr_amka |
structured |
| Hungary TAJ | 123 456 789 |
hu_taj |
structured |
| Iceland kennitala | 010185-1234 |
is_kt |
structured |
| Ireland PPS number | 1234567A |
ie_pps |
structured |
| Latvia personal code | 010185-12345 |
lv_pk |
structured |
| Liechtenstein passport | A12345 |
li_pp |
structured |
| Lithuania personal code | 38501010002 |
lt_ak |
structured |
| Luxembourg national ID | 1985012312345 |
lu_nin |
structured |
| Malta identity card | 12345A |
mt_id |
structured |
| Portugal NIF | 123456789 |
pt_nif |
structured |
| Romania CNP | 1850101123456 |
ro_cnp |
structured |
| Slovakia birth number | 850101/1234 |
sk_bn |
structured |
| Slovenia EMSO | 0101850500003 |
si_emso |
structured |
| Turkey TC identity | 12345678901 |
tr_tc |
structured |
Asia-Pacific
| Label | Value | Category | Type |
|---|---|---|---|
| Australia TFN | 123 456 78 |
au_tfn |
structured |
| Australian Medicare card | 2123456701 |
au_medicare |
structured |
| Australian passport | PA1234567 |
au_passport |
structured |
| New Zealand IRD | 123456789 |
nz_ird |
structured |
| Singapore NRIC | S1234567D |
sg_nric |
structured |
| Hong Kong HKID | A123456(3) |
hk_hkid |
structured |
| Japanese My Number | 123456789012 |
jp_my_number |
structured |
| Indian Aadhaar | 2345 6789 0123 |
in_aadhaar |
structured |
| Indian PAN | ABCDE1234F |
in_pan |
structured |
| Bangladesh National ID | 1234567890 |
bd_nid |
structured |
| Indonesia NIK | 3201234567890001 |
id_nik |
structured |
| Malaysia MyKad | 850101-01-1234 |
my_mykad |
structured |
| Pakistan CNIC | 12345-1234567-1 |
pk_cnic |
structured |
| Philippines PhilSys | 1234-5678-9012 |
ph_philsys |
structured |
| South Korea RRN | 880101-1234567 |
kr_rrn |
structured |
| Sri Lanka NIC | 123456789V |
lk_nic |
structured |
| Thailand national ID | 1-1001-00001-85-1 |
th_nid |
structured |
| Vietnam CCCD | 001012345678 |
vn_cccd |
structured |
Latin America
| Label | Value | Category | Type |
|---|---|---|---|
| Brazilian CPF | 123.456.789-09 |
br_cpf |
structured |
| Brazilian CNPJ | 11.222.333/0001-81 |
br_cnpj |
structured |
| Argentine DNI | 12345678 |
ar_dni |
structured |
| Chilean RUT | 12.345.678-9 |
cl_rut |
structured |
| Colombia cédula | 123.456.789-0 |
co_cedula |
structured |
| Costa Rica cédula | 1-0123-0456 |
cr_cedula |
structured |
| Ecuador cédula | 1234567890 |
ec_cedula |
structured |
| Paraguay RUC | 12345678-9 |
py_ruc |
structured |
| Peru DNI | 12345678 |
pe_dni |
structured |
| Uruguay cédula | 1.234.567-8 |
uy_ci |
structured |
| Venezuela cédula | V-12345678 |
ve_cedula |
structured |
Middle East & Africa
| Label | Value | Category | Type |
|---|---|---|---|
| UAE Emirates ID | 784-1234-1234567-1 |
uae_eid |
structured |
| Saudi National ID | 1234567890 |
sa_nid |
structured |
| South African ID | 9202204720082 |
za_id |
structured |
| Israeli Teudat Zehut | 123456782 |
il_id |
structured |
| Bahrain CPR | 850101234 |
bh_cpr |
structured |
| Iran Melli code | 1234567890 |
ir_melli |
structured |
| Iraq national ID | 123456789012 |
iq_nid |
structured |
| Jordan national ID | 9001012345 |
jo_nid |
structured |
| Kuwait civil ID | 285010112345 |
kw_civil |
structured |
| Lebanon passport | RL123456 |
lb_pp |
structured |
| Qatar QID | 28501011234 |
qa_qid |
structured |
Africa
| Label | Value | Category | Type |
|---|---|---|---|
| Egypt National ID | 28503251234567 |
eg_nid |
structured |
| Ethiopia passport | EP1234567 |
et_passport |
structured |
| Ghana card | GHA-123456789-1 |
gh_card |
structured |
| Kenya KRA PIN | A123456789B |
ke_kra |
structured |
| Morocco CIN | AB12345 |
ma_cin |
structured |
| Nigeria BVN | 12345678901 |
ng_bvn |
structured |
| Tanzania NIDA | 12345678901234567890 |
tz_nida |
structured |
| Tunisia CIN | 12345678 |
tn_cin |
structured |
| Uganda NIN | CM12345678ABCD |
ug_nin |
structured |
Functional
| Label | Value | Category | Type |
|---|---|---|---|
| Session token (32-char hex) | abc123def456abc123def456abc123de |
session_id |
structured |
| PIN block (ISO format 0) | 0123456789ABCDEF |
pin_block |
structured |
| Biometric ID (UUID-style) | 12345678-ABCD-1234-EFGH-123456789ABC |
biometric_id |
structured |
| Card expiry | 12/26 |
card_expiry |
structured |
| Card track 1 | %B4532015112830366^SMITH/JOHN^2512101000000000? |
card_track |
structured |
| MICR check line | ⑈021000021⑈ 123456789012 1234 |
micr |
structured |
| Financial amount | USD 12,345.67 |
financial_amount |
structured |
| ISO 8601 date | 2024-01-15 |
date_iso |
structured |
| SIM ICCID | 89014103211118510720 |
iccid |
structured |
| Educational email | john.smith@mit.edu |
edu_email |
structured |
| Employee ID | EMP1234567 |
employee_id |
structured |
| GPS coordinates | 40.7128,-74.0060 |
gps_coords |
structured |
| Insurance policy number | POL123456789 |
insurance_policy |
structured |
| Bank reference | ACCT12345678 |
bank_ref |
structured |
| Legal case number | 1:24-cv-12345 |
legal_case |
structured |
| Loan/mortgage number | ABCD00123456789012345678 |
loan_number |
structured |
| National Drug Code | 0069-3190-03 |
ndc_code |
structured |
| Date of birth | 01/15/1985 |
dob |
structured |
| Postal code | SW1A 1AA |
postal_code |
structured |
| Masked PAN | 4532 XXXX XXXX 0366 |
masked_pan |
structured |
| Property parcel number | 123-456-789 |
parcel_number |
structured |
| AML case ID | AML-123456789 |
aml_case_id |
structured |
| ISIN | US0378331005 |
isin |
structured |
| Twitter/X handle | @johnsmith |
twitter_handle |
structured |
| URL with embedded credentials | https://admin:password123@example.com/api |
url_with_creds |
structured |
| Vehicle Identification Number | 1HGBH41JXMN109186 |
vin |
structured |
| Fedwire IMAD | 20240101AAAA12345678001234 |
fedwire_imad |
structured |
Global
| Label | Value | Category | Type |
|---|---|---|---|
| Bitcoin legacy address | 1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVN2 |
bitcoin |
structured |
| Ethereum address | 0x742d35Cc6634C0532925a3b844Bc454e4438f44e |
ethereum |
structured |
| Email address | test.user@example.com |
email |
structured |
| US phone number | +1-555-867-5309 |
phone |
structured |
| AWS Access Key ID | AKIAIOSFODNN7EXAMPLE |
aws_key |
heuristic |
| GitHub classic token | ghp_16C7e42F292c6912E7710c838347Ae178B4a |
github_token |
heuristic |
| Stripe test secret key | sk_test_4eC39HqLyjWDarjtT7en6bh8Xy9mPqZ |
stripe_key |
heuristic |
| Slack bot token | xoxb-EXAMPLE-BOTTOKEN-abc123def |
slack_token |
heuristic |
| Sample JWT | (compact JWT string) | jwt |
heuristic |
| Top Secret classification label | TOP SECRET |
classification |
heuristic |
| HIPAA privacy label | HIPAA |
classification |
heuristic |
| Corporate confidential label | Company Confidential |
corp_classification |
heuristic |
| MNPI label | MNPI |
mnpi |
heuristic |
| Cardholder name (PCI) | John Smith |
cardholder_name |
heuristic |
| Privacy/compliance label | PCI-DSS |
privacy_label |
heuristic |
| Attorney-client privilege marker | Attorney-Client Privileged |
attorney_client |
heuristic |
| Confidential supervisory info | Confidential Supervisory Information |
supervisory_info |
heuristic |
| Random 32-char API key | xK9mP2nL4qR7vT1w… |
random_api_key |
heuristic (entropy) |
| Random 48-char base64url token | eyJhbGciOiJIUzI1NiJ9.dGVzdHBheWxvYWQ… |
random_token |
heuristic (entropy) |
| Random 64-char hex secret | a3f8c2e1d4b7a9f0… |
random_secret |
heuristic (entropy) |
| Base64-encoded credential | dXNlcm5hbWU6… |
encoded_credential |
heuristic (entropy) |
| Assignment-form secret | DATABASE_PASSWORD=xK9mP2nL4qR7vT1w… |
assignment_secret |
heuristic (entropy) |
| Gated secret | api_key: xK9mP2nL4qR7vT1w… |
gated_secret |
heuristic (entropy) |
Heuristic payloads are excluded from the default scan. Use --include-heuristic to include them. The entropy-labeled categories also have their own dedicated test harness: see evadex entropy.
Canadian French support
evadex generates test content in Canadian French (fr-CA) so you can verify that your DLP scanner catches sensitive data when surrounded by French-language business text — a common real-world condition in Canadian financial institutions.
French keyword context
The following French Canadian keywords are used as surrounding context in generated documents and evasion variants:
| Category | Keywords |
|---|---|
credit_card |
carte de crédit, numéro de carte, mon numéro de carte est, carte bancaire, numéro de carte bancaire, paiement par carte |
sin |
numéro d'assurance sociale, NAS, mon NAS est, assurance sociale |
iban |
numéro de compte, virement bancaire, coordonnées bancaires, relevé bancaire |
email |
courriel, adresse courriel, mon courriel est |
phone |
numéro de téléphone, composez le, téléphone, cellulaire |
| all categories | renseignements personnels, données confidentielles, informations personnelles, vie privée |
French keywords are active in two places:
context_injectionvariants — 10 additional French CA sentence templates are generated alongside the standard English ones duringevadex scan.splittingvariants — French noise text is prepended/appended infr_ca_prefix_noiseandfr_ca_suffix_noisevariants.
--language fr-CA
Pass --language fr-CA to the generate command to produce test documents with French keyword context sentences:
evadex generate --format docx --category credit_card --category sin \
--count 200 --language fr-CA --output test_fr_ca.docx
evadex generate --format csv --category ca_ramq --count 500 \
--language fr-CA --output ramq_fr.csv
Without --language, the default is English (en).
False positive rate and the --require-context tradeoff
The evadex falsepos command generates structurally-plausible but provably-invalid values (Luhn-failing credit card numbers, SSNs with reserved area codes, IBANs with wrong check digits, etc.) and submits them to your scanner. Any match is a false positive.
What we measured
Three conditions were tested against dlpscan-rs with 100 values per category (7 categories, 700 total):
| Condition | What the scanner receives | What the scanner does |
|---|---|---|
| Baseline | Bare invalid value — 4123456789012341 |
Matches on structure alone |
+--require-context |
Bare invalid value — 4123456789012341 |
Requires surrounding keywords |
+--wrap-context + --require-context |
Invalid value inside a keyword sentence — "Please charge my credit card number 4123456789012341 for..." |
Has both pattern match and keyword context |
Results — false positive rates
| Category | Baseline | --require-context |
--wrap-context + --require-context |
|---|---|---|---|
credit_card |
100.0% | 100.0% | 100.0% |
ssn |
100.0% | 100.0% | 100.0% |
sin |
100.0% | 100.0% | 100.0% |
iban |
100.0% | 100.0% | 100.0% |
phone |
100.0% | 100.0% | 100.0% |
email |
95.0% | 98.0% | 100.0% |
ca_ramq |
99.0% | 99.0% | 100.0% |
| Overall | 99.1% | 99.6% | 100.0% |
100 values per category, seed=default, dlpscan-rs rust adapter, text strategy.
Key findings
--require-context does not reduce false positives for structurally-similar invalid values.
The FP rate is statistically unchanged between the baseline (99.1%) and require-context (99.6%) runs — the difference is within normal statistical noise. dlpscan-rs is matching on value structure (digit count, prefix, format), not on semantic validity (Luhn check, reserved area codes, mod-97 checksum). The context requirement does not gate out pattern-matched values when the pattern match itself is very confident.
Adding keyword context makes it worse, not better.
When invalid values are embedded in realistic keyword sentences (--wrap-context), the FP rate rises to 100.0%. This is the most realistic production scenario — real documents that contain a string resembling a credit card number will almost always have surrounding financial language — and it confirms the scanner flags all structurally-plausible values regardless of validity.
The FP problem is in the pattern layer, not the context layer.
Reducing false positives against dlpscan-rs requires the scanner to perform checksum validation (Luhn for credit cards and SINs, mod-97 for IBANs, reserved-code filtering for SSNs), not keyword-context gating. --require-context is an effective tool for reducing noisy matches in free-form text, but it cannot help when the pattern match itself is the source of the false positive.
Detection rate tradeoff
To quantify the cost of enabling --require-context on real evasion testing, we ran the evadex evasion suite (credit card, SSN, SIN, IBAN — text strategy) under both conditions:
| Baseline | --require-context |
Delta | |
|---|---|---|---|
| Overall detection rate | 94.1% | 94.0% | −0.1 pp |
Per-technique breakdown:
| Technique | Baseline DR | --require-context DR |
Delta |
|---|---|---|---|
bidirectional |
100.0% | 100.0% | 0.0 pp |
context_injection |
100.0% | 100.0% | 0.0 pp |
delimiter |
100.0% | 99.1% | −0.9 pp |
encoding |
85.0% | 90.3% | +5.3 pp |
encoding_chains |
72.5% | 65.9% | −6.6 pp |
morse_code |
65.4% | 55.8% | −9.6 pp |
regional_digits |
100.0% | 100.0% | 0.0 pp |
soft_hyphen |
100.0% | 100.0% | 0.0 pp |
splitting |
100.0% | 100.0% | 0.0 pp |
structural |
94.2% | 92.8% | −1.4 pp |
unicode_encoding |
94.6% | 95.4% | +0.8 pp |
unicode_whitespace |
100.0% | 100.0% | 0.0 pp |
--require-context reduces detection of obfuscated forms the most. Morse code (−9.6 pp) and encoding chains (−6.6 pp) suffer the largest drops — these techniques produce output that contains no recognizable keyword context, so the scanner's context requirement causes it to skip matches it would otherwise make. Conversely, single-layer encoding improves slightly (+5.3 pp) because the decoded context may now satisfy the keyword requirement.
Recommendation for compliance teams
dlpscan-rs's ~99% false positive rate on structurally-plausible invalid values is a fundamental property of its pattern-first detection model. It is intentional: the scanner is tuned for high recall (catch everything) rather than high precision (avoid flagging invalid data).
For production deployments:
- Do not rely on
--require-contextto reduce false positives on free-form document content. It has negligible effect on FP rates when the values are structurally valid-looking, and it costs real detection rate on obfuscated variants (especially morse code and multi-layer encoding). - If false positive rate is a concern, the appropriate mitigation is downstream triage (review queue, confidence thresholding) rather than scanner-level context gating.
- For evasion testing specifically, run
evadex scanwithout--require-context. The baseline detection rate (94.1% on these categories) represents the scanner's real-world behavior for the majority of documents. --require-contextis most useful when scanning large repositories of generic text where you want to reduce noise from coincidental pattern matches — not when testing against structured financial data.
Reproducing the results
# Baseline FP test
evadex falsepos --tool dlpscan-cli \
--exe /path/to/dlpscan --cmd-style rust \
--count 100 --format json -o falsepos_baseline.json
# With require-context (scanner-side flag)
evadex falsepos --tool dlpscan-cli \
--exe /path/to/dlpscan --cmd-style rust \
--count 100 --require-context --format json -o falsepos_require_context.json
# Most realistic: invalid values embedded in keyword context, with require-context
evadex falsepos --tool dlpscan-cli \
--exe /path/to/dlpscan --cmd-style rust \
--count 100 --wrap-context --require-context --format json -o falsepos_full_context.json
# Evasion scan detection rate without require-context
evadex scan --tool dlpscan-cli \
--exe /path/to/dlpscan --cmd-style rust \
--strategy text --category credit_card --category ssn --format json -o evasion_baseline.json
# Evasion scan with require-context (detection rate tradeoff)
evadex scan --tool dlpscan-cli \
--exe /path/to/dlpscan --cmd-style rust \
--strategy text --category credit_card --category ssn \
--require-context --format json -o evasion_require_context.json
Structured vs heuristic categories
evadex classifies its built-in payload categories into two groups:
Structured — formats with well-defined, mathematically or syntactically validatable patterns. DLP scanners typically enforce these patterns precisely (e.g., Luhn check on credit cards, fixed-length digit groups for SSN/SIN, checksum-verified IBAN). Evasion results in this group reflect meaningful signal: a variant that evades detection is a real gap in coverage.
Categories: credit_card, ssn, sin, us_itin, us_ein, us_mbi, us_dl, us_passport, iban, swift_bic, aba_routing, bitcoin, ethereum, au_tfn, au_medicare, au_passport, de_tax_id, de_id, fr_insee, fr_cni, uk_nin, uk_dl, es_dni, it_cf, nl_bsn, se_pin, no_fnr, fi_hetu, pl_pesel, ch_ahv, at_svn, be_nrn, bg_egn, hr_oib, cy_tin, cz_rc, dk_cpr, ee_ik, eu_vat, gr_amka, hu_taj, is_kt, ie_pps, lv_pk, li_pp, lt_ak, lu_nin, mt_id, pt_nif, ro_cnp, sk_bn, si_emso, tr_tc, nz_ird, sg_nric, hk_hkid, jp_my_number, in_aadhaar, in_pan, bd_nid, id_nik, my_mykad, pk_cnic, ph_philsys, kr_rrn, lk_nic, th_nid, vn_cccd, br_cpf, br_cnpj, mx_curp, ar_dni, cl_rut, co_cedula, cr_cedula, ec_cedula, py_ruc, pe_dni, uy_ci, ve_cedula, uae_eid, sa_nid, za_id, il_id, bh_cpr, ir_melli, iq_nid, jo_nid, kw_civil, lb_pp, qa_qid, eg_nid, et_passport, gh_card, ke_kra, ma_cin, ng_bvn, tz_nida, tn_cin, ug_nin, email, phone, ca_ramq, ca_ontario_health, ca_bc_carecard, ca_ab_health, ca_qc_drivers, ca_on_drivers, ca_bc_drivers, ca_passport, ca_mb_health, ca_sk_health, ca_ns_health, ca_nb_health, ca_pei_health, ca_nl_health, ca_mb_drivers, ca_sk_drivers, ca_ns_drivers, ca_nb_drivers, ca_pei_drivers, ca_nl_drivers, ca_business_number, ca_gst_hst, ca_transit_number, ca_bank_account, session_id, pin_block, biometric_id, card_expiry, card_track, micr, financial_amount, date_iso, iccid, edu_email, employee_id, gps_coords, insurance_policy, bank_ref, legal_case, loan_number, ndc_code, dob, postal_code, masked_pan, parcel_number, aml_case_id, isin, twitter_handle, url_with_creds, vin, fedwire_imad
Heuristic — formats where detection relies on fixed prefixes, high-entropy pattern matching, or loosely defined structure. DLP rules for these categories vary widely between scanners and configurations, and a "fail" result may simply reflect that the scanner never had a strong rule for that specific format variant — not that a real exfiltration path was found.
Categories: aws_key, jwt, github_token, stripe_key, slack_token, classification, corp_classification, mnpi, cardholder_name, privacy_label, attorney_client, supervisory_info
Heuristic categories are excluded from the default scan to avoid misleading results. Include them with:
evadex scan --tool dlpscan-cli --include-heuristic
A warning is printed to stderr whenever --include-heuristic is active reminding you to interpret those results with caution.
Installation
Requires Python 3.10+.
pip install evadex
Or install from source:
git clone https://github.com/tbustenk/evadex
cd evadex
pip install -e ".[dev]"
Optional extras:
# Barcode / QR image generation for scanners that decode images (Siphon, etc.)
pip install evadex[barcodes]
# Parquet generation (pyarrow). SQLite is stdlib — no extra needed.
# Pair with a scanner built with data-format extractors, e.g. Siphon
# compiled with `--features data-formats`.
pip install evadex[data-formats]
# 7-Zip archive generation (py7zr). ZIP / nested ZIP / mbox / ics / warc
# all use stdlib only — no extra needed for those. Pair 7z with a scanner
# built with archive extractors, e.g. Siphon compiled with `--features archives`.
pip install evadex[archives]
For reproducible installs with pinned, hash-verified dependencies (recommended for regulated environments):
pip install -r requirements.txt # runtime only
pip install -r requirements-dev.txt # runtime + test dependencies
These lockfiles are generated with pip-compile --generate-hashes and updated with each release.
Quick start
By default evadex runs the banking tier — ~80 payloads optimised for Canadian banking and RBC's compliance surface. No flags required:
evadex scan --tool dlpscan-cli --strategy text
Tiers
| Tier | Payloads | Est. runtime (text strategy) | When to use |
|---|---|---|---|
banking (default) |
~80 Canadian banking focused | ~5 min | Daily checks, quick patches, RBC production testing |
core |
~150 broader PII and financial | ~10 min | Weekly benchmarks, broader compliance checks |
regional |
~350 international coverage | ~20 min | Pre-release validation, international deployments |
full |
All 554 payloads | ~30–40 min | Major releases, compliance audits, onboarding new scanners |
# default — banking tier
evadex scan --tool dlpscan-cli --strategy text
# explicit tiers
evadex scan --tool dlpscan-cli --strategy text --tier core
evadex scan --tool dlpscan-cli --strategy text --tier regional
evadex scan --tool dlpscan-cli --strategy text --tier full
Test a single value:
evadex scan --tool dlpscan-cli --input "4532015112830366" --strategy text
Test with all file strategies (slower — exercises DOCX/PDF/XLSX extraction):
evadex scan --tool dlpscan-cli --input "4532015112830366"
Generate an HTML report:
evadex scan --tool dlpscan-cli --strategy text --format html -o report.html
Configuration
evadex supports an optional evadex.yaml config file. Config file values are defaults — any CLI flag you pass overrides the corresponding config value.
Generating a starter config
evadex init
Creates evadex.yaml in the current directory:
# evadex configuration file
# Run 'evadex scan --config evadex.yaml' to use this file.
# CLI flags take precedence over values in this file.
tool: dlpscan-cli
strategy: text
min_detection_rate: 85
scanner_label: production
exe: null
cmd_style: python
# tier: banking # banking (default) | core | regional | full
# categories: # explicit category list — overrides tier when set
# - credit_card
# - ssn
# - iban
include_heuristic: false
concurrency: 20
timeout: 30.0
output: results.json
format: json
Using a config file
Pass it explicitly:
evadex scan --config evadex.yaml
Or drop evadex.yaml in the current directory and evadex will pick it up automatically — no flag needed.
CLI flags always win. To override a config value for one run:
# Config says scanner_label: production — this run uses "staging" instead
evadex scan --config evadex.yaml --scanner-label staging
Config keys
| Key | Type | CLI equivalent | Description |
|---|---|---|---|
tool |
string | --tool |
Adapter name (dlpscan-cli, dlpscan, siphon, presidio) |
strategy |
string or list | --strategy |
Submission strategy: text, docx, pdf, xlsx. Use a list for multiple. |
min_detection_rate |
number | --min-detection-rate |
CI/CD gate threshold (0–100) |
scanner_label |
string | --scanner-label |
Label recorded in JSON meta.scanner |
exe |
string or null | --exe |
Path to scanner executable |
cmd_style |
python or rust |
--cmd-style |
Command format for dlpscan-cli |
tier |
string | --tier |
Payload tier: banking (default), core, regional, full. Ignored when categories is set. |
categories |
list of strings | --category |
Payload categories to test (overrides tier) |
include_heuristic |
boolean | --include-heuristic |
Include heuristic categories |
concurrency |
integer | --concurrency |
Max concurrent requests (default: 20) |
timeout |
number | --timeout |
Request timeout in seconds |
output |
string or null | --output |
Output file path (null = stdout) |
format |
json or html |
--format |
Output format |
audit_log |
string or null | --audit-log |
Append-only audit log file (see Audit log) |
c2_url |
string or null | --c2-url |
Siphon-C2 admin-dashboard URL to push results to. See Siphon-C2 integration. |
c2_key |
string or null | --c2-key |
API key sent as x-api-key to Siphon-C2. Same format as Siphon's core API key. |
Validation
evadex validates the config file on load and exits with a clear error for invalid values:
Error: Config 'min_detection_rate' must be between 0 and 100, got: 150.0
Error: Invalid strategy value(s): foobar. Valid: docx, pdf, text, xlsx
Error: Unknown config key(s): bad_key. Valid keys: categories, cmd_style, ...
Example output
Terminal summary
Running evadex scan against dlpscan-cli at http://localhost:8080...
Done. 590 tests — N detected, N evaded
Detection rates depend on your scanner, its version, and how it's configured.
JSON output (--format json, default)
{
"meta": {
"timestamp": "2026-04-01T22:01:36.172424+00:00",
"scanner": "rust-2.0.0",
"total": 590,
"pass": 514,
"fail": 76,
"error": 0,
"pass_rate": 87.1,
"summary_by_category": {
"credit_card": { "pass": 109, "fail": 15, "error": 0 },
"ssn": { "pass": 43, "fail": 10, "error": 0 },
"iban": { "pass": 36, "fail": 8, "error": 0 }
},
"summary_by_generator": {
"delimiter": { "pass": 72, "fail": 10, "error": 0 },
"unicode_encoding": { "pass": 54, "fail": 13, "error": 0 }
}
},
"results": [
{
"payload": {
"value": "5105105105105100",
"category": "credit_card",
"category_type": "structured",
"label": "Mastercard 16-digit"
},
"variant": {
"value": "5105105105105100",
"generator": "delimiter",
"technique": "no_delimiter",
"transform_name": "All delimiters removed",
"strategy": "text"
},
"detected": true,
"severity": "pass",
"duration_ms": 371.01,
"error": null,
"raw_response": { "matches": [{ "type": "credit_card", "value": "5105105105105100" }] }
},
{
"payload": {
"value": "046 454 286",
"category": "sin",
"category_type": "structured",
"label": "Canada SIN"
},
"variant": {
"value": "Ο4б 4Ƽ4 ΚȢб",
"generator": "unicode_encoding",
"technique": "homoglyph_substitution",
"transform_name": "Visually similar Cyrillic/Greek characters substituted",
"strategy": "text"
},
"detected": false,
"severity": "fail",
"duration_ms": 378.57,
"error": null,
"raw_response": { "matches": [] }
}
]
}
Severity values:
| Value | Meaning |
|---|---|
pass |
Scanner detected the variant (good) |
fail |
Scanner missed the variant — evasion succeeded |
error |
Adapter error (network, timeout, malformed scanner response, etc.) |
CLI reference
evadex scan
Run DLP evasion tests against a scanner.
evadex scan [OPTIONS]
| Flag | Default | Description |
|---|---|---|
--config |
(auto-discovered) | Path to evadex.yaml config file. Auto-discovered from current directory if present. CLI flags always override config values. |
--tool, -t |
dlpscan-cli |
Adapter to use. Built-in adapters: dlpscan-cli, dlpscan, siphon, presidio. |
--input, -i |
(banking tier) | Single value to test. If omitted, runs the banking tier (~80 payloads). Use --tier to change. Category is auto-detected (Luhn check, regex patterns for SSN/IBAN/AWS/JWT/email/phone). |
--format, -f |
json |
Output format: json or html |
--output, -o |
stdout | Write report to file instead of stdout |
--strategy |
all four | Submission strategy: text, docx, pdf, xlsx. Repeat the flag for multiple. Omit to run all four. |
--tier |
banking |
Payload tier: banking (default), core, regional, full. Ignored when --category is specified. |
--concurrency |
20 |
Max concurrent requests |
--timeout |
30.0 |
Request timeout in seconds |
--url |
http://localhost:8080 |
Base URL (for HTTP-based adapters: dlpscan, siphon, presidio) |
--api-key |
(env: EVADEX_API_KEY) |
API key passed as Authorization: Bearer. Use the environment variable in preference to the CLI flag to avoid exposure in shell history and process listings. |
--category |
(overrides --tier) | Filter built-in payloads by category. Repeat for multiple. When set, --tier is ignored. |
--variant-group |
(all) | Limit to specific generator(s). Repeat for multiple. Values: unicode_encoding, delimiter, splitting, leetspeak, regional_digits, structural, encoding, context_injection, unicode_whitespace, bidirectional, soft_hyphen, morse_code |
--include-heuristic |
off | Also run heuristic categories (aws_key, jwt, github_token, stripe_key, slack_token, classification). A warning is printed when enabled — see Structured vs heuristic categories. |
--scanner-label |
(empty) | Label recorded in the JSON meta.scanner field. Use to tag a specific scanner version, e.g. python-1.3.0 or rust-2.0.0. Useful when comparing results across scanner builds. |
--exe |
dlpscan |
Path to the scanner executable (dlpscan-cli adapter only). Use when dlpscan is not on PATH or you need to target a specific build. |
--cmd-style |
python |
Command format for dlpscan-cli: python (invokes dlpscan -f json <file>) or rust (invokes dlpscan --format json scan <file>). |
--min-detection-rate |
(off) | Exit with code 1 if the detection rate falls below this threshold (0–100). Intended for CI/CD pipeline gating. Report is always written before the exit. |
--baseline |
(off) | Save this run's JSON results to a file for future comparison. |
--compare-baseline |
(off) | Compare this run against a previously saved baseline and print a regression summary to stderr. |
--audit-log |
(off) | Append a one-line JSON audit record for this run to a file. Parent directories are created if they do not exist. Can also be set via audit_log in evadex.yaml. |
--feedback-report |
(off) | Save a structured JSON feedback report to PATH. Contains per-technique evasion counts with example variant values, actionable fix suggestions, and the generated regression test code as a string field. Always written when specified, even if there are no evasions. |
--require-context |
off | Pass --require-context to dlpscan-rs: only flag matches when surrounding keywords are present. Reduces false positives but may reduce detection rate for obfuscated variants lacking keyword context. Requires --cmd-style rust. Can also be set via require_context in evadex.yaml. |
--wrap-context |
auto (rust) | Embed every variant value in a realistic keyword sentence before submission. Automatically enabled when --cmd-style rust is used — dlpscan-rs requires surrounding context keywords to flag most matches; submitting a bare value produces artificially low detection rates. Pass --no-wrap-context to suppress. Can also be set via wrap_context in evadex.yaml. |
--no-wrap-context |
off | Explicitly disable context wrapping even when --cmd-style rust is active. |
Note for dlpscan-rs users: dlpscan-rs requires surrounding context keywords (e.g. "credit card", "SSN", "IBAN") to be present near a matched value before it will flag it. Submitting a bare value like
4532015112830366without context will produce a false no-match — the scanner can see the number but will not fire without contextual evidence.evadex scanauto-enables--wrap-contextwhen--cmd-style rustis used so every variant is embedded in a realistic business sentence. Use--no-wrap-contextonly if you are deliberately testing bare-value behaviour or your dlpscan-rs build is configured with context matching disabled.
evadex generate
Generate test documents filled with synthetic sensitive data for DLP scanner testing. Values are embedded in realistic business sentences, tables, and paragraphs. Evasion variants use the same obfuscation techniques as evadex scan.
evadex generate (--format FORMAT | --formats FMT,FMT,...) --output PATH [OPTIONS]
| Flag | Default | Description |
|---|---|---|
--format |
(one of format/formats required) | Single output format: xlsx, docx, pdf, csv, txt, eml, msg, json, xml, sql, log, png, jpg, multi_barcode_png, edm_json, parquet, sqlite, zip, zip_nested, 7z, mbox, ics, warc |
--formats |
(one of format/formats required) | Comma-separated list of formats. Output is a path stem; extensions are appended. --formats xlsx,docx,pdf --output dir/test → test.xlsx, test.docx, test.pdf |
--barcode-type |
qr |
Barcode encoding for png/jpg/multi_barcode_png: qr (unicode, up to 4296 chars), code128 (ASCII 1D), ean13 (13 digits, zero-padded), pdf417 (2D, requires optional pdf417gen), datamatrix (2D, requires optional pylibdmtx), or random. |
--output |
(required) | Output file path (with --format) or path stem (with --formats) |
--tier |
banking |
Payload tier when --category is not set: banking (default), core, regional, full |
--category |
(overrides --tier) | Payload category to include. Repeat for multiple. |
--count |
100 |
Number of test values to generate per category |
--evasion-rate |
0.5 |
Fraction of values that are evasion variants (0.0–1.0) |
--keyword-rate |
0.5 |
Fraction of values wrapped in keyword context sentences (0.0–1.0) |
--technique |
(all) | Limit evasion variants to specific technique names. Repeat for multiple. |
--language |
en |
Language for keyword context sentences: en (English) or fr-CA (Canadian French) |
--random |
off | Randomise categories, evasion rate, and keyword rate |
--seed |
(none) | Integer seed for reproducible output |
--include-heuristic |
off | Also include heuristic categories (AWS keys, tokens, JWT, etc.) |
--count-per-category |
(uses --count) | Override count for a specific category. Repeat for multiple. Example: --count-per-category credit_card:200 --count-per-category sin:50 |
--total |
(off) | Generate exactly N records distributed evenly across selected categories. Example: --total 1000 |
--density |
medium |
How frequently sensitive values appear in filler text: low (one per paragraph), medium (one per 2-3 sentences), high (almost every sentence) |
--technique-group |
(all) | Limit evasion variants to a specific generator family. Repeat for multiple. Example: --technique-group unicode_encoding |
--technique-mix |
(off) | Exact proportion per technique group, comma-separated. Proportions must sum to 1.0. Example: --technique-mix unicode_encoding:0.4,encoding:0.3,splitting:0.3 |
--evasion-per-category |
(uses --evasion-rate) | Override evasion rate for a specific category. Repeat for multiple. Example: --evasion-per-category credit_card:0.7 --evasion-per-category sin:0.2 |
--template |
generic |
Document template controlling structure and tone: generic, invoice, statement, hr_record, audit_report, source_code, config_file, chat_log, medical_record, env_file, secrets_file, code_with_secrets (entropy-focused: .env / YAML secrets / bare-value source code — pair with entropy categories) |
--noise-level |
medium |
Ratio of filler text to sensitive values: low (mostly values), medium (balanced), high (lots of business text) |
Format details:
xlsx— Multiple sheets: oneSummarysheet plus one sheet per category. Columns include embedded text, plain value, variant value, technique, and generator. Evasion rows are highlighted yellow.docx— Title page with disclaimer; one heading per category; two-thirds prose paragraphs, one-third tabular layout. Supports--templatefor alternate document structures.pdf— Sections per category with header/footer; evasion rows highlighted.csv— Flat CSV with columns:category,plain_value,variant_value,technique,generator,transform_name,has_keywords,embedded_text.txt— Plain-text document with section headings and numbered entry list. Supports--templatefor alternate document structures.eml— RFC 2822 email file with From/To/Subject headers, realistic names, and sensitive values in the body. Example:"Please find attached the statement for card 4532015112830366".msg— Outlook message format. Currently generates EML-format content with.msgextension (DLP scanners extract text identically).json— Structured JSON data export. Array of records with realistic field names:customer_id,card_number,name,email, plus filler fields. Pretty-printed.xml— Financial messaging format resembling ISO 20022 (pain.001) payment messages. Sensitive values in appropriate XML elements (<IBAN>,<BIC>,<Ustrd>).sql— Database dump format withCREATE TABLEandINSERT INTOstatements. Example:INSERT INTO customers (id, name, sin, card_number) VALUES (1, 'John Smith', '046 454 286', '4532015112830366');.log— Application log format with timestamps, log levels, and services. Mixes plaintext, structured, and JSON log formats.png/jpg— Image grid of barcodes/QR codes, one per entry. Targets scanners that extract text from images via barcode decoding (e.g. Siphon'sextract_barcodepipeline, which decodes QR, Data Matrix, PDF417, Code 128, EAN-13, etc.). Capped at 60 barcodes per image for decompression-bomb safety and to stay under Siphon's 100-codes-per-image decode cap. Value is rendered with a quiet zone plus a human-readable label. Requirespip install evadex[barcodes].multi_barcode_png— PNG styled like a scanned form with a header bar, body text, and a mixed grid of QR, Code 128, and EAN-13 codes carrying different sensitive values. Exercises multi-format decoding in one pass. Requirespip install evadex[barcodes].edm_json— flat JSON file ({"values": [{"value", "category", "label"}, …]}) matching the shape of Siphon'sPOST /v1/edm/registerrequest body. Use for bulk EDM registration — see EDM testing.parquet— Apache Parquet with a flat customer/banking schema (customer_id,name,email,phone,sin,card_number,iban,swift_bic, …) and snappy compression. Each sensitive payload lands in its category-appropriate column; remaining columns are filled with realistic fake data. Written in 1000-row row groups so large files exercise multi-group Parquet readers. Targets scanners with Parquet extractors (e.g. Siphon built with--features data-formats, which reads the first 10,000 rows). Requirespip install evadex[data-formats].sqlite— SQLite database with three realistic banking tables (customers,transactions,accounts). Payloads route to whichever table owns their category. Uses Python's stdlibsqlite3so no extra install is needed on evadex's side — the scanner still needs its own SQLite support (Siphon'sextract_sqliterequires thedata-formatsfeature and reads up to 5,000 rows per table).zip— ZIP archive containing 4–12 inner files (customer_data.csv,transactions_q1.csv,audit_log.txt,config.json, …) with sensitive payloads spread across them, plus amanifest.xmlindex. Stdlibzipfile. Note: Siphon's plain-ZIP extractor incrates/siphon-core/src/extractors.rsonly walks*.xmlentries — text inside non-OOXML ZIPs is currently not extracted, so detection on this format mostly serves to document the gap. Use7z(below) when you need detection to actually fire end to end.zip_nested— ZIP-inside-ZIP-inside-ZIP, three levels deep, with sensitive data only in the innermost archive. Tests recursive-archive extraction (which Siphon does not currently perform). Stdlibzipfile.7z— 7-Zip / LZMA2 archive with the same banking-filename inner structure aszip. Siphon'sextract_7zdoes read txt/csv/json content (1 MB per file, 100 KB content cap), so this is the right choice when you want detection to fire on the archive contents. Requirespip install evadex[archives].mbox— Unix mailbox file with one realistic email per entry (sensible From/To/Subject/Date headers, banking-domain prose). Roughly one in three messages usesContent-Transfer-Encoding: base64so Siphon'sextract_mboxdecode path gets exercised. Stdlibmailbox/email.ics— iCalendar (RFC 5545) file with one VEVENT per entry. Sensitive payloads land inSUMMARY,DESCRIPTION, andATTENDEEproperties — exactly what Siphon'sextract_icswalks. CRLF-terminated and 75-octet line-folded so any conformant calendar parser will read it.warc— Web ARChive (ISO 28500 / WARC 1.1) with onewarcinforecord plus one HTTP-responserecord per entry. Sensitive values are embedded in synthetic HTML banking-portal bodies inside the captured responses — exercises Siphon'sextract_warc.
Template details:
generic(default) — Mixed prose and table format (existing behaviour).invoice— Payment invoice layout with line items, amounts, HST, and totals.statement— Bank statement with account details, transaction history, and balance.hr_record— HR employee records with personal information fields grouped per employee.audit_report— Internal audit report with executive summary, detailed findings (severity-rated), and recommendations.source_code— Realistic source code with sensitive values as hardcoded strings, variable assignments, and comments. Mixes Python, JavaScript, and generic syntax.config_file— Application config (randomly INI, YAML, or ENV format) with sensitive values as configuration parameters.chat_log— Messaging/chat export with timestamps, participant names, and sensitive values shared in conversation.medical_record— Clinical notes and patient records with MRN, DOB, diagnoses, medications, and sensitive identifiers.
Examples:
# Banking tier (default) — all three formats in one pass
evadex generate --formats xlsx,docx,pdf --tier banking --count 100 \
--evasion-rate 0.3 --output reports/banking_en
# Canadian French — banking tier
evadex generate --formats xlsx,docx,pdf --tier banking --count 100 \
--evasion-rate 0.3 --language fr-CA --output reports/banking_frca
# Large CSV for bulk testing
evadex generate --format csv --tier banking --count 500 \
--evasion-rate 0.5 --output reports/banking_large.csv
# 100 credit cards, 40% evasion variants → XLSX
evadex generate --format xlsx --category credit_card --count 100 \
--evasion-rate 0.4 --output test_cards.xlsx
# Mixed categories → DOCX
evadex generate --format docx \
--category credit_card --category ssn --category iban \
--count 50 --evasion-rate 0.5 --output test_mixed.docx
# Reproducible random document
evadex generate --format xlsx --random --count 500 --seed 42 --output random.xlsx
# CSV for programmatic inspection
evadex generate --format csv --category ssn --count 1000 \
--evasion-rate 0.3 --output ssn_variants.csv
# New formats — email, JSON, XML, SQL, log
evadex generate --format eml --tier banking --count 50 --output test_email.eml
evadex generate --format json --tier banking --count 200 --output export.json
evadex generate --format xml --category iban --category credit_card --count 100 --output payments.xml
evadex generate --format sql --tier banking --count 500 --output dump.sql
evadex generate --format log --tier banking --count 1000 --output app.log
evadex generate --formats eml,json,xml,sql,log --tier banking --output reports/multi
# Parquet / SQLite (Parquet requires: pip install evadex[data-formats]; SQLite is stdlib)
evadex generate --format parquet --tier banking --count 1000 --evasion-rate 0.3 --output test.parquet
evadex generate --format sqlite --tier banking --count 1000 --evasion-rate 0.3 --output test.db
# French-Canadian column/table names
evadex generate --format parquet --tier banking --count 100 --language fr-CA --output test_frca.parquet
evadex generate --format sqlite --tier banking --count 100 --language fr-CA --output test_frca.db
# Archive and message formats (zip / mbox / ics / warc are stdlib;
# 7z requires: pip install evadex[archives])
evadex generate --format zip --tier banking --count 100 --output test_output/test.zip
evadex generate --format zip_nested --tier banking --count 50 --output test_output/test_nested.zip
evadex generate --format 7z --tier banking --count 100 --output test_output/test.7z
evadex generate --format mbox --tier banking --count 50 --output test_output/test.mbox
evadex generate --format ics --tier banking --count 30 --output test_output/test.ics
evadex generate --format warc --tier banking --count 20 --output test_output/test.warc
# Archive evasion variants (password, double-extension, deep nesting, mixed formats)
evadex generate --format zip --category credit_card --count 20 --evasion-rate 1.0 \
--technique-group archive_evasion --output archive_evasion.zip
# Barcode / QR code images (requires: pip install evadex[barcodes])
evadex generate --format png --category credit_card --count 10 --barcode-type qr --output qr.png
evadex generate --format png --category credit_card --count 10 --barcode-type code128 --output code128.png
evadex generate --format png --category credit_card --count 12 --barcode-type ean13 --output ean13.png
evadex generate --format jpg --category credit_card --count 10 --evasion-rate 0.3 --output cards.jpg
evadex generate --format multi_barcode_png --category credit_card --category ssn --count 10 --output form.png
# Barcode image evasions (split across two codes, noise overlay, rotation, embed-in-document)
evadex generate --format png --category credit_card --count 4 --evasion-rate 1.0 \
--technique-group barcode_evasion --output evasion.png
# Per-category count overrides
evadex generate --format xlsx --tier banking --count 100 \
--count-per-category credit_card:500 --count-per-category sin:50 --output overrides.xlsx
# Total record distribution
evadex generate --format json --tier banking --total 1000 --output distributed.json
# Evasion technique control
evadex generate --format xlsx --tier banking --evasion-rate 0.5 \
--technique-group unicode_encoding --output unicode_only.xlsx
evadex generate --format xlsx --tier banking --evasion-rate 0.5 \
--technique-mix unicode_encoding:0.4,encoding:0.3,splitting:0.3 --output mixed.xlsx
# Per-category evasion rates
evadex generate --format csv --tier banking --evasion-rate 0.3 \
--evasion-per-category credit_card:0.9 --evasion-per-category sin:0.1 --output targeted.csv
# Document templates
evadex generate --format docx --tier banking --template statement --count 100 --output statement.docx
evadex generate --format txt --tier banking --template invoice --output invoice.txt
evadex generate --format txt --category credit_card --template source_code --count 50 --output leaked_code.txt
evadex generate --format txt --category credit_card --template chat_log --count 20 --output chat_export.txt
evadex generate --format txt --tier banking --template audit_report --noise-level high --output audit.txt
# Density and noise control
evadex generate --format docx --tier banking --density high --count 100 --output dense.docx
evadex generate --format pdf --tier banking --noise-level high --count 100 --output noisy.pdf
Value generation:
evadex generates values two ways:
- Synthetic generators (preferred, unlimited) — Produce structurally valid values algorithmically, so
--count 1000always returns 1000 distinct values. Registered for:credit_card— Luhn-valid numbers for Visa, Mastercard, Amex, Discoversin— Valid Canadian SINs (Luhn checksum, NNN NNN NNN format)iban— Valid IBANs for GB, DE, and FR (ISO 13616 mod-97 checksum)phone— Canadian E.164 numbers (+1-NPA-NXX-XXXX) from real area codesemail— Realistic addresses with common Canadian and international domainsca_ramq— Quebec RAMQ health card numbers (XXXX YYMM DDSS format)ca_mb_health,ca_sk_health— 9-digit Manitoba/Saskatchewan health cardsca_ns_health— Nova Scotia 10-digit health card (NNNN NNN NNN format)ca_nb_health,ca_nl_health— 10-digit NB/NL health cardsca_pei_health— 12-digit PEI health cardca_mb_drivers— Manitoba licence (LL-NNN-NNN-NNN format)ca_sk_drivers— Saskatchewan 8-digit licenceca_ns_drivers— Nova Scotia licence (2 letters + 7 digits)ca_nb_drivers— New Brunswick 7-digit licenceca_pei_drivers— PEI 6-digit licenceca_nl_drivers— Newfoundland licence (1 letter + 9 digits)ca_business_number— Canadian Business Number (9 digits, CRA)ca_gst_hst— GST/HST registration (9-digit BN + RT + 4 digits)ca_transit_number— Transit/routing number (NNNNN-NNN format)ca_bank_account— Bank account (7–12 random digits)ssn— Valid US Social Security Numbers (AAA-BB-CCCC, no reserved area / group / serial blocks). (v3.13.0)uk_nin— Valid UK National Insurance Numbers (XX NNNNNN X, HMRC-compliant prefix and suffix rules). (v3.13.0)br_cpf— Valid Brazilian CPFs (NNN.NNN.NNN-DD, two-pass Receita Federal checksum, all-same-digit base rejected). (v3.13.0)au_medicare— Valid Australian Medicare cards (NNNN NNNNN N, weighted check digit per Services Australia). (v3.13.0)de_tax_id— Valid German Steuer-IdNr (11 digits, ISO 7064 MOD 11,10 check digit, exactly-twice duplicate-digit rule). (v3.13.0)us_dl— US driver-licence numbers cycling through all 50 state + DC formats (shape only — most state DLs have no public checksum). (v3.13.0)
- Seed rotation fallback — Categories without a synthetic generator rotate through the built-in seed values.
- Evasion variants — Drawn from all 12 evadex generators (same techniques as
evadex scan). Use--techniqueto restrict to specific techniques.
evadex compare
Diff two evadex scan result JSON files and report what changed between them.
evadex compare [OPTIONS] FILE_A FILE_B
| Flag | Default | Description |
|---|---|---|
--format, -f |
json |
Output format: json or html |
--output, -o |
stdout | Write report to file instead of stdout |
--label-a |
(from JSON meta.scanner) | Override the label for the first file |
--label-b |
(from JSON meta.scanner) | Override the label for the second file |
The compare report includes:
- Overall delta in detection rate (percentage points)
- Per-category detection rate changes
- Per-technique detection rate changes (only techniques where the rate changed)
- Per-variant diff list (variants where severity changed between the two runs)
evadex init
Generate a default evadex.yaml config file in the current directory.
evadex init
Creates evadex.yaml with sensible defaults. Edit the file and run evadex scan --config evadex.yaml, or drop it in the working directory for auto-discovery.
evadex falsepos
Measure scanner false positive rate — values that look like sensitive data but are provably invalid.
Generates structurally plausible but mathematically invalid values (Luhn-failing credit card numbers, SSNs with reserved area codes, SINs with wrong checksums, IBAN-shaped strings with invalid mod-97 checks, etc.) and submits them to the scanner. Any value the scanner flags is a false positive.
evadex falsepos [OPTIONS]
| Flag | Default | Description |
|---|---|---|
--tool, -t |
dlpscan-cli |
Adapter to use |
--category |
(all) | Category to test. Repeat for multiple. Supported: credit_card, ssn, sin, iban, email, phone, ca_ramq |
--count |
100 |
Number of false positive values per category |
--format, -f |
table |
Output format: table (summary to stderr) or json (full report) |
--output, -o |
stdout | Write JSON report to file |
--exe |
dlpscan |
Path to scanner executable (dlpscan-cli only) |
--cmd-style |
python |
Command format for dlpscan-cli: python or rust |
--timeout |
30.0 |
Request timeout in seconds |
--concurrency |
5 |
Max concurrent scanner requests |
--seed |
(random) | Integer seed for reproducible false positive values |
--require-context |
off | Pass --require-context to dlpscan-rs: only flag matches when surrounding keywords are present. Requires --cmd-style rust. See False positive rate and the --require-context tradeoff for measured impact. |
--wrap-context |
off | Embed each invalid value in a realistic category-specific sentence before submitting. Simulates how sensitive data appears in real documents. Use with --require-context for the most realistic false positive measurement. |
Examples:
# Test false positive rate for credit cards
evadex falsepos --tool dlpscan-cli --category credit_card --count 100
# All categories
evadex falsepos --tool dlpscan-cli --count 100
# Save JSON report
evadex falsepos --tool dlpscan-cli --count 100 --format json -o falsepos_report.json
Output:
credit_card 0/100 flagged (0.0%)
ssn 2/100 flagged (2.0%)
sin 0/100 flagged (0.0%)
...
Overall false positive rate: 0.3% (2/700)
The JSON report includes per-category rates, overall rate, and the list of specific values that were incorrectly flagged:
{
"tool": "dlpscan-cli",
"count_per_category": 100,
"total_tested": 700,
"total_flagged": 2,
"overall_false_positive_rate": 0.3,
"by_category": {
"credit_card": {
"total": 100,
"flagged": 0,
"false_positive_rate": 0.0,
"flagged_values": []
},
"ssn": {
"total": 100,
"flagged": 2,
"false_positive_rate": 2.0,
"flagged_values": ["000-12-3456", "666-99-0001"]
}
}
}
False positive generators by category:
| Category | Generation strategy |
|---|---|
credit_card |
16-digit numbers with card-like prefixes (4, 51, 37, 6011) that fail the Luhn check |
ssn |
NNN-NN-NNNN with reserved area codes: 000, 666, 900–999 |
sin |
NNN NNN NNN with valid first digit (1–7) but wrong Luhn check digit |
iban |
IBAN-shaped strings (GB/DE/FR) with a deliberately wrong mod-97 check digit |
email |
user@domain.invalid — uses IANA-reserved TLDs (.invalid, .test, .example, .localhost) |
phone |
+1-NPA-NXX-XXXX with invalid NANP area codes (000, 555, 911, etc.) |
ca_ramq |
RAMQ-shaped XXXX YYMM DDSS with invalid birth month codes (00, 13–50, 63–99) |
evadex list-payloads
List all built-in test payloads with their categories and types.
evadex list-payloads [--type structured|heuristic]
| Flag | Default | Description |
|---|---|---|
--type |
(all) | Filter to structured or heuristic payloads only |
evadex list-techniques
List all registered evasion generators and the techniques each one applies. Generator names shown here can be used with evadex generate --technique-group and --technique-mix.
evadex list-techniques [--generator NAME]
| Flag | Default | Description |
|---|---|---|
--generator, -g |
(all) | Show techniques for a specific generator only |
evadex techniques
Show per-technique scanner-detection rates from the audit log. Powers the --evasion-mode weighted and --evasion-mode adversarial selections — and useful on its own for spotting which techniques the scanner has been letting through. (v3.13.0)
evadex techniques [OPTIONS]
| Flag | Default | Description |
|---|---|---|
--audit-log |
results/audit.jsonl |
Source audit log to aggregate. |
--last |
10 |
Aggregate only the most recent N audit entries. |
--top |
(all) | Show only the top N techniques by latest scanner-detection rate. |
--category |
(all) | Substring filter against technique name (e.g. unicode, encoding). |
--min-runs |
1 |
Require at least N data points before showing a technique. |
Sample output:
Technique scanner-detection rates (last 10 runs, 3 techniques)
┌──────────────────────┬────────┬────────┬──────┬──────────┐
│ Technique │ Latest │ Avg │ Runs │ Trend │
├──────────────────────┼────────┼────────┼──────┼──────────┤
│ unicode_zwsp │ 9.1% │ 12.3% │ 4 │ ↓ -3.2% │
│ homoglyph_substitute │ 18.4% │ 17.6% │ 4 │ → +0.4% │
│ base64_of_rot13 │ 23.5% │ 26.7% │ 4 │ ↓ -2.1% │
└──────────────────────┴────────┴────────┴──────┴──────────┘
"Latest" / "Avg" are scanner-detection rates — lower is better evasion. Cold-start (no history yet) prints a hint and exits cleanly.
--evasion-mode (in evadex scan and evadex generate)
Control how techniques are chosen for evasion variants based on what's worked historically. (v3.13.0)
| Mode | Behaviour |
|---|---|
random (generate default) |
Uniform random across applicable techniques. |
exhaustive (scan default) |
Every applicable variant is run / generated. |
weighted |
Bias selection by 1 − historical_detection. Techniques that have evaded best are picked more often. Falls back to random if no audit history exists. |
adversarial |
Restrict to techniques whose historical detection is ≤ 50 %. In evadex scan, the variant-group filter narrows accordingly. Falls back to the full pool if the filter leaves no candidates. |
Both weighted and adversarial read history from --audit-log (defaults to results/audit.jsonl). Run a few normal scans with --audit-log set first to build the history. Until then, evadex techniques shows a cold-start hint and --evasion-mode weighted/adversarial falls back to random with a warning.
# Build history with a few baseline runs
evadex scan --tool dlpscan-cli --strategy text --tier banking \
--audit-log results/audit.jsonl
# Now bias toward techniques that have evaded
evadex scan --tool dlpscan-cli --strategy text --tier banking \
--evasion-mode adversarial --audit-log results/audit.jsonl
# In generate, focus regression fixtures on the hardest evasions
evadex generate --format xlsx --tier banking \
--evasion-mode weighted --count 100 --output test_weighted.xlsx
Examples
# Only test credit card payloads
evadex scan --tool dlpscan-cli --strategy text --category credit_card
# Only run unicode evasion techniques
evadex scan --tool dlpscan-cli --strategy text --variant-group unicode_encoding
# Only run unicode + delimiter techniques on SSN and IBAN
evadex scan --tool dlpscan-cli --strategy text \
--category ssn --category iban \
--variant-group unicode_encoding --variant-group delimiter
# Test a custom value (category auto-detected)
evadex scan --tool dlpscan-cli --input "AKIAIOSFODNN7EXAMPLE" --strategy text
# File strategy only — test DOCX extraction pipeline
evadex scan --tool dlpscan-cli --input "4532015112830366" --strategy docx
# Save HTML report
evadex scan --tool dlpscan-cli --strategy text --format html -o report.html
# Target a specific scanner binary, tag the output
evadex scan --tool dlpscan-cli --exe /opt/dlpscan/dlpscan --cmd-style rust \
--scanner-label "rust-2.0.0" --format json -o rust_results.json
# Compare two scanner builds
evadex scan --tool dlpscan-cli --scanner-label "python-1.3.0" -o python.json
evadex scan --tool dlpscan-cli --exe /opt/rust-dlpscan --cmd-style rust \
--scanner-label "rust-2.0.0" -o rust.json
evadex compare python.json rust.json --format html -o comparison.html
Performance and recommended limits
Benchmarks captured on a Windows / Python 3.13 / 32 GB host running the banking tier with --evasion-rate 0.5. Times include the full evadex pipeline — payload selection, evasion variant generation, and writer I/O.
| Format | count=100 | count=1 000 | count=10 000 | Peak RSS (1 k) | Notes |
|---|---|---|---|---|---|
csv |
~1.5 s | ~3 s | ~20 s, 92 MB output | 103 MB | Linear scaling — recommended for large fixtures. |
xlsx |
~3 s | ~13 s | not recommended | 259 MB | openpyxl materialises every cell in memory. Linear extrapolation puts 10 k at ~2.5 GB peak. Use csv or sqlite for larger volumes. |
sqlite |
~1.6 s | ~4 s | ~24 s, 114 MB output | 143 MB / 309 MB at 10 k | Prior to v3.13.0, the customer table was built in Python before insert and 10 k pushed RSS over 500 MB. Now uses 1000-row chunked executemany. |
parquet |
n/a | n/a | n/a | n/a | Generation works, but Siphon's extractor hangs on every Parquet file ≥ 1 KB — see results/format_detection_matrix.md. Skipped from perf testing. |
Concurrency tuning (evadex scan --concurrency N) on Windows against the dlpscan binary: --concurrency 10 → 17.8 variants/s, --concurrency 20 → 18.9, --concurrency 50 → 20.6. Process-spawn overhead dominates at high concurrency on Windows. Sweet spot is 20–50; going higher rarely pays for itself.
Recommended --count ceilings per format to stay under 500 MB peak RSS without further optimisation:
| Format | Safe ceiling |
|---|---|
csv, txt, json, xml, sql, log, mbox, ics, warc |
50 000 + |
sqlite, 7z |
25 000 |
xlsx, docx, pdf |
2 000 (memory-heavy formats; chunk via --formats + multiple runs for larger volumes) |
parquet |
unlimited generation, but skip if you intend to scan with Siphon ≤ 22f7971 |
CI/CD integration
evadex supports a --min-detection-rate flag that exits with code 1 if the scanner's detection rate falls below a threshold. Use it as a pipeline gate to prevent deploying a scanner configuration that regresses detection coverage.
evadex scan --tool dlpscan-cli \
--strategy text \
--scanner-label "$(dlpscan --version)" \
--format json -o results.json \
--min-detection-rate 90
Exit code 0 means the threshold was met; exit code 1 means it was not. The report is always written before the exit check.
To track regressions against a known-good baseline:
# Save a baseline from the current production scanner
evadex scan --tool dlpscan-cli --scanner-label "prod-baseline" \
--baseline baseline.json
# In CI: compare the candidate scanner against the baseline
evadex scan --tool dlpscan-cli --scanner-label "candidate" \
--compare-baseline baseline.json \
--min-detection-rate 90
The --compare-baseline flag prints a regression summary to stderr listing any variants that were previously detected and are now missed, and any improvements.
GitHub Actions workflows
evadex ships ready-to-drop-in GitHub Actions workflows for the Siphon repo at docs/github-actions/. Both build Siphon with --features full, start its API server, and run the evadex banking-tier suite against the binary.
| File | Trigger | What it does |
|---|---|---|
evadex-regression.yml |
every push to main and every PR |
banking-tier scan + false-positive suite, baseline diff if evadex_baseline.json is committed, posts a per-category breakdown back to the PR |
evadex-daily.yml |
cron 0 6 * * * (06:00 UTC) |
full banking-tier scan + false-positive suite, posts a one-line summary to Slack if the SLACK_WEBHOOK secret is set, fails when detection drops below 85 % |
To install:
# In Siphon's repo
mkdir -p .github/workflows
curl -O https://raw.githubusercontent.com/tbustenk/evadex/main/docs/github-actions/evadex-regression.yml
curl -O https://raw.githubusercontent.com/tbustenk/evadex/main/docs/github-actions/evadex-daily.yml
mv evadex-regression.yml evadex-daily.yml .github/workflows/
git add .github/workflows/
git commit -m "ci: add evadex DLP evasion regression workflows"
To set the regression baseline (commit it to the Siphon repo so the workflow has something to diff against):
evadex scan --tool siphon --url http://localhost:8080 --api-key $KEY \
--tier banking --strategy text \
--baseline evadex_baseline.json
git add evadex_baseline.json
git commit -m "ci: refresh evadex DLP detection baseline"
To tune the gating threshold (default 85 %), edit the --min-detection-rate 85 line in either workflow file. A failing scan exits non-zero and fails the workflow, so the threshold doubles as a deploy gate.
Slack notifications (daily workflow only): create an incoming webhook in Slack, then add the URL as a repo secret named SLACK_WEBHOOK. Without the secret the Slack step skips silently.
Audit log
evadex can append a one-line JSON record to a log file after every scan. This gives you a durable, append-only history of what was tested, when, and what the result was — useful for compliance reviews, trend tracking, and demonstrating that regular scans are being performed.
evadex scan --tool dlpscan-cli \
--scanner-label "rust-2.0.0" \
--strategy text \
--audit-log /var/log/evadex/audit.jsonl
Or set it in evadex.yaml so it fires automatically on every run:
audit_log: /var/log/evadex/audit.jsonl
Audit record format
Each run appends exactly one line. Fields:
| Field | Type | Description |
|---|---|---|
timestamp |
ISO 8601 string | When the scan ran (UTC) |
evadex_version |
string | Installed evadex version |
operator |
string | OS username of the person who ran the scan |
scanner_label |
string | Value of --scanner-label (empty if not set) |
tool |
string | Adapter used |
strategies |
array | Submission strategies used |
categories |
array | Categories filtered to (empty = all structured) |
include_heuristic |
bool | Whether heuristic categories were included |
total |
int | Total test cases run |
pass |
int | Variants detected |
fail |
int | Variants that evaded scanner |
error |
int | Adapter errors |
pass_rate |
float | Detection rate percentage |
output_file |
string | null | Path of the report file written, or null |
baseline_saved |
string | null | Path of baseline saved, or null |
compare_baseline |
string | null | Path of baseline compared against, or null |
min_detection_rate |
float | null | Gate threshold used, or null |
exit_code |
int | 0 if scan succeeded, 1 if detection-rate gate failed |
Notes
- The log file is opened in append mode — existing entries are never modified or deleted.
- Parent directories are created automatically if they do not exist.
- A write failure (permissions, disk full, bad path) is silently ignored. The scan result and exit code are never affected by audit log errors.
- The log contains detection rates and category breakdowns but not variant values. It is safe to store in shared log aggregation systems.
Feedback loop
evadex Phase 2 implements a GAN-inspired feedback cycle: evadex is the adversarial fuzzer and your DLP scanner is the discriminator. When the fuzzer finds an evasion that works, the system automatically surfaces what failed and how to close the gap — without requiring manual triage.
After any scan that produces evasions, evadex does three things automatically:
- Prints fix suggestions to stderr — one concrete, actionable normalisation step per unique bypass technique.
- Writes
evadex_regressions.pyto the current directory — a pytest file with one test function per evasion, using dlpscan'sInputGuardAPI. These tests fail until the scanner is fixed. - Optionally writes a structured JSON feedback report via
--feedback-report PATH.
Fix suggestions
Suggestions are printed to stderr after the scan summary whenever evasions are found:
=== Fix Suggestions ===
• homoglyph_substitution (unicode_encoding)
Add Cyrillic/Greek lookalikes to homoglyph normalisation map: О→0, З→3, ο→0, Α→A, Ζ→Z.
Apply NFKC normalisation then a homoglyph table lookup before scanning
• zero_width_zwsp (unicode_encoding)
Strip U+200B (Zero Width Space) from input in the normalisation pipeline before pattern matching
• base64_standard (encoding)
Add a base64 decode pass to the normalisation pipeline; scan the decoded content
Each suggestion names the technique, the generator group it belongs to, and a specific normalisation step to add to the scanner's input pipeline.
Regression test file
evadex_regressions.py is written to the current directory whenever there are evasions. Each test function:
- Is named after the payload label and evasion technique (
test_visa_16_digit_homoglyph_substitution) - Imports and invokes dlpscan's
InputGuardwith the appropriate preset (PCI_DSS,PII, orCREDENTIALS) - Scans the exact obfuscated variant value that evaded detection
- Asserts
not result.is_clean— the test passes once the scanner is fixed
def test_visa_16_digit_homoglyph_substitution():
"""Visa 16-digit evaded via homoglyph_substitution — should be detected"""
from dlpscan import InputGuard, Preset
guard = InputGuard(presets=[Preset.PCI_DSS])
result = guard.scan('4532\u041e15112830366') # Visually similar Cyrillic/Greek characters substituted
assert not result.is_clean
def test_canada_sin_zero_width_zwsp():
"""Canada SIN evaded via zero_width_zwsp — should be detected"""
from dlpscan import InputGuard, Preset
guard = InputGuard(presets=[Preset.PII])
result = guard.scan('0\u200b4\u200b6\u200b \u200b4\u200b5\u200b4\u200b \u200b2\u200b8\u200b6') # Zero-width ZWSP between every character
assert not result.is_clean
Run the generated file with:
pytest evadex_regressions.py
Tests fail until the scanner is patched. Each time you fix a technique and re-run evadex, failing tests disappear and the regression file is regenerated to reflect the remaining gaps.
--feedback-report PATH
Saves a structured JSON report containing everything in one file:
evadex scan --feedback-report feedback.json
Report structure:
{
"meta": {
"timestamp": "2026-04-07T14:22:01.123456+00:00",
"scanner": "python-1.6.0",
"total_tests": 590,
"total_evasions": 76
},
"techniques": [
{
"technique": "homoglyph_substitution",
"generator": "unicode_encoding",
"count": 23,
"example_variants": ["4532\u041e15112830366", "4\u03bf32015112830366"]
},
{
"technique": "zero_width_zwsp",
"generator": "unicode_encoding",
"count": 18,
"example_variants": ["0\u200b4\u200b6 4\u200b5\u200b4 2\u200b8\u200b6"]
}
],
"fix_suggestions": [
{
"technique": "homoglyph_substitution",
"generator": "unicode_encoding",
"description": "Sensitive values bypassed detection by substituting ASCII digits/letters with visually identical Unicode characters from Cyrillic, Greek, or other scripts",
"suggested_fix": "Add Cyrillic/Greek lookalikes to homoglyph normalisation map: О→0, З→3, ο→0, Α→A, Ζ→Z. Apply NFKC normalisation then a homoglyph table lookup before scanning"
}
],
"regression_test_code": "\"\"\"Regression tests generated by evadex.\n...\"\"\"\nimport pytest\n\n\ndef test_visa_16_digit_homoglyph_substitution():\n ..."
}
The report is always written, even when there are no evasions (techniques and fix_suggestions will be empty arrays, regression_test_code will be an empty string).
Three-phase design
| Phase | Role | Status |
|---|---|---|
| Phase 1 | Adversarial fuzzer — evasion generators test known-sensitive values against the scanner | ✅ Done |
| Phase 2 | Feedback generator — surfaces fix suggestions, regression tests, and structured reports when evasions succeed | ✅ Done |
| Phase 3 | False-positive adversary — generates values that look sensitive but aren't, to measure scanner precision | ✅ Done (evadex falsepos) |
Together, Phase 1 measures false negatives (sensitive values the scanner misses) and Phase 3 measures false positives (non-sensitive values the scanner incorrectly flags). Both are needed for a complete picture of scanner accuracy.
Adapters
Built-in: dlpscan-cli
Invokes the dlpscan CLI directly as a subprocess. evadex was built and tested with dlpscan as the reference scanner. Requires dlpscan to be installed and on PATH (or provide --exe).
evadex scan --tool dlpscan-cli
For file strategies, evadex builds the document in memory and writes it to a temp file, runs the scanner against it, then immediately deletes the temp file. No persistent disk footprint from test data. File extraction support in dlpscan requires pip install dlpscan[office].
Built-in: dlpscan
Generic HTTP adapter for any DLP tool that exposes a REST API. Sends plain text to POST /scan with a {"content": "..."} body, and file uploads to POST /scan/file as multipart form data. Expects a JSON response with a detected boolean (configurable via the response_detected_key extra config option).
evadex scan --tool dlpscan --url http://my-dlpscan-server:8080 --api-key my-key
Built-in: siphon
Native adapter for dlpscan-rs / Siphon via its HTTP API. Use this in production environments where the CLI isn't available — for example, when Siphon runs as a sidecar or dedicated DLP service. Talks to POST /v1/scan and parses Siphon's full response (findings, confidence scores, and — when present — BIN brand/country, entropy classification, and validator name).
Start the Siphon API server:
# Bind to localhost:8000 with an API key. Use `0.0.0.0` to expose to other hosts.
DLPSCAN_API_HOST=127.0.0.1 \
DLPSCAN_API_PORT=8000 \
DLPSCAN_API_KEY=$SIPHON_API_KEY \
dlpscan serve
Run evadex against it:
# --api-key is sent via the `x-api-key` header; EVADEX_API_KEY works too.
evadex scan --tool siphon \
--url http://localhost:8000 \
--api-key $SIPHON_API_KEY
Adapter extras (via evadex.yaml):
tool: siphon
url: http://localhost:8000
api_key: ${EVADEX_API_KEY}
presets: [pci_dss, pii] # compliance presets to enable
categories: [] # optional category allowlist
min_confidence: 0.5 # confidence floor forwarded to Siphon
require_context: false # require surrounding keywords to flag a match
When the Siphon adapter reports a match, the result also carries Siphon-specific detail in the JSON output:
| Field | Description |
|---|---|
confidence |
Recognizer confidence from 0.0 – 1.0 |
bin_brand |
Card-network brand (Visa, Mastercard, …) for credit card findings |
bin_country |
Issuing country from the BIN lookup |
entropy_classification |
High-entropy heuristic label (e.g. api_key) |
validator |
Which validator accepted the match (luhn, mod97, …) |
evadex compare surfaces confidence score changes between two Siphon runs alongside severity transitions, so per-variant regressions are visible even when the pass/fail outcome is unchanged.
Entropy-mode testing
Siphon's scanner has four high-entropy-token detection modes, each gating the 4.5 bits/char threshold differently:
| Mode | Gate | When to use |
|---|---|---|
gated |
Keyword (secret, key, token, api_key, password, bearer, …) within 80 chars |
Default — highest precision, lowest recall |
assignment |
Token preceded by an assignment (KEY=, "key":, export KEY=) within 60 chars |
Catches .env and config-file leaks |
all |
Any high-entropy token ≥16 chars passes | Highest recall, noisy — source-code audits |
off |
Disabled | Default for Siphon — entropy adds latency |
Siphon's token floor is 16 characters and the Shannon threshold is 4.5 bits/char, so pure-hex secrets (max entropy ~4.0 bits/char) pass through any mode untouched — a real gap to be aware of.
evadex entropy targets all detection modes at once:
# Sanity-check every mode against a Siphon instance
evadex entropy --tool siphon --url http://localhost:8000 --api-key $EVADEX_API_KEY
# Score coverage against a specific configured mode
evadex entropy --tool siphon --mode gated # only gated contexts expected to hit
evadex entropy --tool siphon --mode assignment
evadex entropy --tool siphon --mode all
The command submits each high-entropy payload in three contexts — bare (value alone), gated (value next to api_key:), and assignment (SECRET_TOKEN=value) — and reports which context each category was caught in. It also runs the entropy_evasion generator and lists which evasion techniques defeated detection (split, comment-injection, concatenation, low-entropy mixing, double encoding, space breaking).
Adding a custom adapter
-
Create a file anywhere in your project, e.g.
my_adapter.py. -
Subclass
BaseAdapterand implementsubmit():
from evadex.adapters.base import BaseAdapter
from evadex.core.registry import register_adapter
from evadex.core.result import Payload, Variant, ScanResult
@register_adapter("my-tool")
class MyToolAdapter(BaseAdapter):
name = "my-tool"
async def submit(self, payload: Payload, variant: Variant) -> ScanResult:
# Send variant.value to your scanner however it expects it.
# variant.strategy is "text", "docx", "pdf", or "xlsx".
# Return a ScanResult with detected=True/False.
response = await call_my_scanner(variant.value)
detected = response.get("found", False)
return ScanResult(
payload=payload,
variant=variant,
detected=detected,
raw_response=response,
)
- Import your adapter before invoking evadex (so the
@register_adapterdecorator fires), then use it:
python -c "import my_adapter" && evadex scan --tool my-tool
Or wire it up properly as a package with an entry point in pyproject.toml:
[project.entry-points."evadex.adapters"]
my-tool = "my_package.my_adapter"
Optional hooks:
async def setup(self):
# Called once before the batch — open connections, authenticate, etc.
self._session = await open_session()
async def teardown(self):
# Called once after the batch — clean up connections.
await self._session.close()
async def health_check(self) -> bool:
# Optional — verify the scanner is reachable.
return await ping_scanner()
File strategies: variant.strategy tells you which format evadex wants to use. If your scanner only supports one method, handle what you need:
from evadex.adapters.dlpscan.file_builder import FileBuilder
async def submit(self, payload, variant):
if variant.strategy == "text":
raw = await self._scan_text(variant.value)
else:
data, mime = FileBuilder.build(variant.value, variant.strategy)
raw = await self._scan_file(data, mime)
...
FileBuilder.build(text, fmt) returns (bytes, mime_type) entirely in memory — no disk writes.
EDM testing
Siphon's Exact Data Match (EDM) engine catches specific known values — real SSNs, account numbers, tokens — rather than just structurally-plausible patterns. Values are HMAC-SHA256 hashed after normalisation and stored as a hash set; scan tokens are hashed the same way and constant-time compared. EDM complements pattern matching: patterns catch the shape of sensitive data, EDM catches the actual records.
Normalisation applied before hashing (from crates/siphon-core/src/edm.rs :: normalize_value, in order):
- NFKC Unicode normalisation
- Lowercase
- Trim leading/trailing whitespace
- Remove every character matching
[\s\-./()]+(whitespace, hyphens, dots, slashes, parens)
That's what EDM absorbs. Anything else — Cyrillic/Greek homoglyphs, zero-width joiners, unicode substitutions — defeats EDM because NFKC does not fold distinct Unicode scripts together.
The evadex edm command:
# Register built-in payloads with Siphon EDM, verify each one, then probe
# which evasion transforms its normaliser absorbs.
evadex edm --url http://localhost:8000 --api-key $SIPHON_KEY
# Restrict to specific categories
evadex edm --category credit_card --category sin --limit 25
# Corpus generation only — no Siphon contact — write a bulk-import file
evadex edm --generate-corpus --output edm_corpus.json
evadex edm --generate-corpus --corpus-format csv --count 1000 --output edm_corpus.csv
# Dry run: print what would be registered without sending anything
evadex edm --dry-run --category credit_card
What the command reports:
| Section | Meaning |
|---|---|
| EDM exact-value detection table | Each registered value resubmitted verbatim — should be 100% if EDM is configured correctly |
| EDM evasion probe table | Detection rate per transform (exact, uppercase, dashes, spaces, dots, slashes, nbsp_spaces, homoglyph_0, homoglyph_o, zero_width). yes = absorbed by normaliser; no = defeats EDM; partial = depends on the value |
Registration uses the category namespace evadex_test_<original_category> so test hashes never collide with production EDM categories. Note: Siphon's HTTP API exposes POST /v1/edm/register and GET /v1/edm/categories but no delete endpoint, so true cleanup requires clearing the server's EDM state file or restarting the server. The namespace prefix keeps stray hashes clearly identifiable.
Performance note. Siphon's EDM does a constant-time scan over every registered hash per token to prevent timing leaks — above MAX_CONSTANT_TIME_HASHES = 50,000 total hashes the scan cost grows linearly. evadex edm prints a warning when a registration run would cross that threshold.
EDM bulk-registration corpus format (--format edm_json on evadex generate):
{
"values": [
{"value": "4532015112830366", "category": "credit_card", "label": "Visa test"},
{"value": "046 454 286", "category": "sin", "label": "SIN test"}
]
}
The shape matches Siphon's POST /v1/edm/register request body (flat, one values[] array). Split by category and POST each slice, or replay the whole file through a small wrapper — the field names line up with Siphon's API.
Siphon-C2 integration
Siphon-C2 is the admin web UI and management plane described in the dlpscan-rs architecture docs — it aggregates operational metrics plus test results so detection quality is visible alongside live scanning. evadex can push its scan, false-positive, comparison, and history reports to C2 in one line of extra flags.
Setup. Point evadex at your C2 deployment via either flags, environment variables, or evadex.yaml:
# CLI flags
evadex scan --tool siphon --tier banking \
--c2-url http://c2.internal:9090 --c2-key $C2_API_KEY
# Environment variables (picked up by scan / falsepos / compare / history)
export EVADEX_C2_URL=http://c2.internal:9090
export EVADEX_C2_KEY=$C2_API_KEY
evadex scan --tool siphon --tier banking
Or in evadex.yaml:
c2_url: http://c2.internal:9090
c2_key: ${C2_API_KEY}
Endpoints pushed to:
| Command | C2 endpoint | Payload |
|---|---|---|
evadex scan |
POST /v1/evadex/scan |
counts, pass rate, per-category/per-technique breakdown, top 50 failing variants |
evadex falsepos |
POST /v1/evadex/falsepos |
per-category FP rate + flagged-value list |
evadex compare |
POST /v1/evadex/compare |
full comparison dict (overall delta, per-technique diffs, confidence changes) |
evadex history --push-c2 |
POST /v1/evadex/history |
batched audit-log entries for dashboard backfill |
Every push is authenticated via an x-api-key header — the same format the core Siphon HTTP API uses — so C2 can reuse one key-management surface. Requests also carry a User-Agent: evadex/<version> header and an evadex_version field so C2 can surface client-version mix on the dashboard.
Backfill on first connect:
# One-shot push of every historical audit-log entry to a fresh C2
evadex history --push-c2 --c2-url http://c2.internal:9090 --c2-key $C2_API_KEY
Graceful degradation. Siphon-C2 is explicitly documented as not critical path — evadex honours that contract:
- A failed push (network error, 4xx/5xx, timeout, auth failure) prints a single-line warning to stderr and continues.
- The scan / falsepos / compare exit code is never affected by a C2 push failure.
- The
--min-detection-rateCI/CD gate still fires based on the actual scan result. - The on-disk output file (
--output), audit log, regression tests, and baseline comparison all complete normally regardless of C2 reachability.
The only exception is evadex history --push-c2 without --c2-url / EVADEX_C2_URL set — that's a user error (no target URL to push to) and exits non-zero.
Output schema
Top-level
{
"meta": { ... },
"results": [ ... ]
}
meta
| Field | Type | Description |
|---|---|---|
timestamp |
ISO 8601 string | When the scan ran (UTC) |
scanner |
string | Scanner label from --scanner-label (empty string if not set) |
total |
int | Total test cases run |
pass |
int | Variants detected by scanner |
fail |
int | Variants that evaded scanner |
error |
int | Adapter errors |
pass_rate |
float | pass / total * 100, rounded to one decimal |
summary_by_category |
object | Per-category pass/fail/error counts, sorted alphabetically by category name |
summary_by_generator |
object | Per-generator pass/fail/error counts, sorted alphabetically by generator name |
results[]
| Field | Type | Description |
|---|---|---|
payload.value |
string | Original sensitive value |
payload.category |
string | Detected category enum value |
payload.category_type |
string | structured or heuristic — see Structured vs heuristic categories |
payload.label |
string | Human-readable label |
variant.value |
string | Transformed/obfuscated value submitted to scanner |
variant.generator |
string | Which generator produced this variant |
variant.technique |
string | Machine-readable technique name |
variant.transform_name |
string | Human-readable description of the transform |
variant.strategy |
string | Submission strategy: text, docx, pdf, xlsx |
detected |
bool | Whether the scanner flagged this variant. false for error results — check severity to distinguish |
severity |
string | pass (detected), fail (not detected), or error (adapter error) |
duration_ms |
float | Time for this test case in milliseconds |
error |
string | null | Error message if adapter threw; null otherwise |
raw_response |
object | Raw parsed response from the adapter. For dlpscan-cli this is {"matches": [...]}. May contain match objects that include the variant value — treat the output file accordingly. |
Coverage
evadex payload coverage relative to the dlpscan-rs pattern library (557 individual sub-patterns across 126 categories).
Each row shows coverage at the sub-pattern level — e.g. "Credit Card Numbers — 7/7" means all seven card-network variants (Visa, Amex, Mastercard, Discover, JCB, UnionPay, Diners) have a dedicated seed payload.
Identity documents
| Region / Category | dlpscan-rs sub-patterns | evadex coverage | Notes |
|---|---|---|---|
| Credit Card Numbers | 7 | 7/7 ✓ | Visa, Amex, Mastercard, Discover, JCB, UnionPay, Diners |
| US Driver's Licences | 51 + 1 generic | 52/52 ✓ | All 50 states + DC + generic |
| US — other identifiers | 12 | 12/12 ✓ | SSN, ITIN, EIN, MBI, Passport, Passport Card, NPI, DoD ID, KTN, DEA, USA Routing Number, US Phone Number — completed this release |
| North America — Canada | 29 | 29/29 ✓ | All provincial health/DL/corporate/BN/SIN; 3 DL payloads corrected this release |
| North America — Mexico | 7 | 7/7 ✓ | CURP, RFC, Clave Elector, INE CIC, INE OCR, NSS, Passport — all added this release |
| Europe — United Kingdom | 7 | 7/7 ✓ | NIN, DL, NHS, Passport, Phone, Sort Code, UTR — completed this release |
| Europe — Germany | 6 | 6/6 ✓ | Tax ID, ID, IBAN, Social Insurance, DL, Passport — completed this release |
| Europe — France | 5 | 5/5 ✓ | NIR, CNI, IBAN, DL, Passport — completed this release |
| Europe — Spain | 5 | 5/5 ✓ | DNI, IBAN, NIE, NSS, Passport — completed this release |
| Europe — Italy | 5 | 5/5 ✓ | Codice Fiscale/SSN, DL, Partita IVA, Passport — completed this release |
| Europe — Netherlands | 4 | 4/4 ✓ | BSN, DL, IBAN, Passport — completed this release |
| Europe — Poland | 6 | 6/6 ✓ | PESEL, NIP, REGON, DL, ID Card, Passport — completed this release |
| Europe — Sweden | 4 | 4/4 ✓ | PIN, Org Number, DL, Passport — completed this release |
| Europe — Norway | 4 | 4/4 ✓ | FNR, D-Number, DL, Passport — completed this release |
| Europe — Switzerland | 4 | 4/4 ✓ | AHV, UID, DL, Passport — completed this release |
| Europe — Finland | 3 | 3/3 ✓ | HETU, DL, Passport — completed this release |
| Europe — Austria | 5 | 5/5 ✓ | SVN, Tax, DL, ID Card, Passport — completed this release |
| Europe — Belgium | 4 | 4/4 ✓ | NRN, VAT, DL, Passport — completed this release |
| Europe — (19 other EU/EEA countries) | ~75 | ~75/75 ✓ | Bulgaria, Croatia, Cyprus, Czech, Denmark, EU-ETD, Estonia, Greece, Hungary, Iceland, Ireland, Latvia, Liechtenstein, Lithuania, Luxembourg, Malta, Portugal, Romania, Slovakia, Slovenia, Turkey — all sub-patterns added this release |
| Asia-Pacific — Australia | 11 | 11/11 ✓ | TFN, Medicare, Passport, 8 state DL variants — completed this release |
| Asia-Pacific — China / HK / Macau / TW | 5 | 5/5 ✓ | Resident ID, Passport, HK ID, Macau ID, TW NID — completed this release |
| Asia-Pacific — India | 6 | 6/6 ✓ | Aadhaar, PAN, DL, Passport, Ration Card, Voter ID — completed this release |
| Asia-Pacific — Japan | 6 | 6/6 ✓ | My Number, DL, Health Ins, Juminhyo, Passport, Residence Card — completed this release |
| Asia-Pacific — Singapore | 4 | 4/4 ✓ | NRIC, FIN, DL, Passport — completed this release |
| Asia-Pacific — South Korea | 3 | 3/3 ✓ | RRN, DL, Passport — completed this release |
| Asia-Pacific — New Zealand | 4 | 4/4 ✓ | IRD, NHI, DL, Passport — completed this release |
| Asia-Pacific — Philippines | 6 | 6/6 ✓ | PhilSys, PhilHealth, SSS, TIN, UMID, Passport — completed this release |
| Asia-Pacific — (7 other AP countries) | ~24 | ~24/24 ✓ | Bangladesh, Indonesia, Malaysia, Pakistan, Sri Lanka, Thailand, Vietnam — all sub-patterns added this release |
| Latin America — Brazil | 6 | 6/6 ✓ | CPF, CNPJ, CNH, RG, SUS, Passport — completed this release |
| Latin America — Argentina | 3 | 3/3 ✓ | DNI, CUIL/CUIT, Passport — completed this release |
| Latin America — Chile | 2 | 2/2 ✓ | RUT, Passport — completed this release |
| Latin America — Colombia | 4 | 4/4 ✓ | Cedula, NIT, NUIP, Passport — completed this release |
| Latin America — (8 other LatAm countries) | ~27 | ~27/27 ✓ | Costa Rica, Ecuador, Paraguay, Peru, Uruguay, Venezuela — all sub-patterns added this release |
| Middle East — UAE | 3 | 3/3 ✓ | Emirates ID, Passport, Visa — completed this release |
| Middle East — (10 other ME countries) | ~21 | ~21/21 ✓ | Bahrain, Iran, Iraq, Israel, Jordan, Kuwait, Lebanon, Qatar, Saudi Arabia — all sub-patterns added this release |
| Africa — South Africa | 3 | 3/3 ✓ | ID, DL, Passport — completed this release |
| Africa — (9 other African countries) | ~27 | ~27/27 ✓ | Egypt, Ethiopia, Ghana, Kenya, Morocco, Nigeria, Tanzania, Tunisia, Uganda — all sub-patterns added this release |
Financial, secrets, and functional
| Category | dlpscan-rs sub-patterns | evadex coverage | Notes |
|---|---|---|---|
| Banking & Financial | 5 | 5/5 ✓ | IBAN, SWIFT, ABA, Canada Transit, US Bank Account |
| IBAN (country-specific) | 4 named | 4/4 ✓ | UK, DE, FR, ES, NL IBANs all represented |
| Banking Authentication | 3 | 3/3 ✓ | PIN Block, Encryption Key, HSM Key — completed this release |
| Cryptocurrency | 7 | 7/7 ✓ | Bitcoin (legacy + Bech32), Ethereum, Bitcoin Cash, Litecoin, Monero, Ripple — completed this release |
| Card Track Data | 2 | 2/2 ✓ | Track 1, Track 2 — completed this release |
| Check & MICR | 3 | 3/3 ✓ | MICR, Cashier Check, Check Number — completed this release |
| Cloud Secrets | 3 | 3/3 ✓ | AWS Access Key, AWS Secret Key, Google API Key — completed this release |
| Code Platform Secrets | 5 | 5/5 ✓ | GitHub Classic, OAuth, Fine-Grained PAT, NPM Token, PyPI Token — completed this release |
| Messaging Secrets | 6 | 6/6 ✓ | Slack Bot, Slack User, Slack Webhook, Mailgun, SendGrid, Twilio — completed this release |
| Generic Secrets | 4 | 4/4 ✓ | JWT, Bearer Token, DB Connection String, Private Key — completed this release |
| Payment Secrets | 2 | 2/2 ✓ | Stripe Secret Key, Stripe Publishable Key — completed this release |
| Contact Information | 5 | 5/5 ✓ | Email, Phone (E.164), IPv4, IPv6, MAC Address — completed this release |
| Device Identifiers | 5 | 5/5 ✓ | ICCID, IDFA/IDFV, IMEI, IMEISV, MEID — completed this release |
| Geolocation | 2 | 2/2 ✓ | GPS Coordinates, Geohash — completed this release |
| Securities Identifiers | 6 | 6/6 ✓ | ISIN, CUSIP, FIGI, LEI, SEDOL, Ticker Symbol — completed this release |
| Medical Identifiers | 4 | 4/4 ✓ | NDC Code, DEA Number, Health Plan ID, ICD-10 Code — completed this release |
| Loan & Mortgage | 4 | 4/4 ✓ | Loan Number, ULI, LTV Ratio, MERS MIN — completed this release |
| Legal Identifiers | 2 | 2/2 ✓ | US Federal Case Number, Court Docket Number — completed this release |
| Regulatory Identifiers | 6 | 6/6 ✓ | AML Case ID, CTR, Compliance Case, FinCEN, OFAC SDN, SAR — completed this release |
| Insurance Identifiers | 2 | 2/2 ✓ | Policy Number, Claim Number — completed this release |
| Internal Banking Refs | 2 | 2/2 ✓ | Internal Account Ref, Teller ID — completed this release |
| Property Identifiers | 2 | 2/2 ✓ | Parcel Number, Title Deed — completed this release |
| Social Media | 2 | 2/2 ✓ | Twitter Handle, Hashtag — completed this release |
| Employment | 2 | 2/2 ✓ | Employee ID, Work Permit — completed this release |
| Education | 1 | 1/1 ✓ | EDU Email |
| Dates | 3 | 3/3 ✓ | ISO, US, EU date formats |
| Postal Codes | 5 | 5/5 ✓ | UK, US ZIP+4, Canada, Brazil CEP, Japan — completed this release |
| Personal Identifiers | 2 | 2/2 ✓ | Date of Birth, Gender Marker — completed this release |
| Primary Account Numbers | 2 | 2/2 ✓ | PAN (via credit cards), Masked PAN |
| Customer Financial Data | 4 | 4/4 ✓ | Balance with Currency, Account Balance, DTI Ratio, Income Amount — completed this release |
| Authentication Tokens | 1 | 1/1 ✓ | Session ID |
| Biometric Identifiers | 2 | 2/2 ✓ | Template ID, Biometric Hash (via IDFA payload) |
| VIN | 1 | 1/1 ✓ | Vehicle Identification Number |
| Wire Transfer | 6 | 6/6 ✓ | Fedwire IMAD, CHIPS UID, Wire Reference Number, ACH Trace Number, ACH Batch Number, SEPA Reference — completed this release |
Classification & governance labels
| Category | dlpscan-rs sub-patterns | evadex coverage | Notes |
|---|---|---|---|
| Corporate Classification | 9 | 9/9 ✓ | Confidential, DND, Embargoed, Eyes Only, Highly Conf, Internal Only, NTK, Proprietary, Restricted — completed this release |
| Data Classification Labels | 8 | 8/8 ✓ | Top Secret, CUI, Classified Conf, FOUO, LES, NOFORN, SBU, Secret — completed this release |
| Privacy Classification | 10 | 10/10 ✓ | HIPAA, PCI-DSS, CCPA, FERPA, GDPR, GLBA, NPI, PHI, PII, SOX — completed this release |
| Financial Regulatory Labels | 7 | 7/7 ✓ | MNPI, Draft-Not-for-Circ, Info Barrier, Inside Info, Invest Restricted, Market Sensitive, Pre-Decisional — completed this release |
| Privileged Information | 7 | 7/7 ✓ | Attorney-Client, Legal Privilege, Litigation Hold, Privileged Info, P&C, Protected by Priv, Work Product — completed this release |
| Supervisory Information | 6 | 6/6 ✓ | CSI, Exam Findings, Non-Public, Restricted, Supervisory Conf, Supervisory Ctrl — completed this release |
| URLs with Credentials | 2 | 2/2 ✓ | URL with Password, URL with Token — completed this release |
| PCI Sensitive Data | 1 | 1/1 ✓ | Cardholder Name |
Summary: evadex covers 489/557 sub-patterns (88%) across all 126 dlpscan-rs categories with 554 seed payloads. Of those 489: 421 structured categories confirmed detected by direct dlpscan-rs seed scan; 68 heuristic categories excluded from scanner verification per design (JWT, API keys, labels). The remaining 68 unrepresented sub-patterns are low-specificity numeric patterns (e.g. 6–9 digit sequences) where the same dlpscan regex fires on dozens of existing payloads — no distinct seed value is feasible without a context keyword. Seed-scan verified against dlpscan-rs — see new_cat_verification.json for per-category results.
Security notes
- API keys: Prefer the
EVADEX_API_KEYenvironment variable over the--api-keyCLI flag. Command-line arguments are visible in process listings (ps aux) and may be saved in shell history. - Output files: The JSON report's
raw_responsefields may contain scanner match objects that echo variant values (transformed versions of sensitive test data). Apply appropriate access controls to report files. - Temp files: The
dlpscan-cliadapter writes each test variant to a temp file for subprocess invocation and deletes it immediately after the scan. No persistent disk footprint from test data. - Network isolation: Run evadex and the scanner on an isolated test network. Test variant values are obfuscated but structurally derived from real sensitive patterns.
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file evadex-3.13.1.tar.gz.
File metadata
- Download URL: evadex-3.13.1.tar.gz
- Upload date:
- Size: 307.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4e8186ee0b8507b0660c97ebf75b3193ecbd669c76583dd21b12ebce29b2c793
|
|
| MD5 |
5a2f8bd5729a6cb3188c369350a594cf
|
|
| BLAKE2b-256 |
4030eb5334dd6c029fca4c2a2545eb4df1710926b696b3a545b287d095408b90
|
File details
Details for the file evadex-3.13.1-py3-none-any.whl.
File metadata
- Download URL: evadex-3.13.1-py3-none-any.whl
- Upload date:
- Size: 282.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
af9ae3fbe09e6725991221009394f01e475f2d74d8430f19f506828d41046b78
|
|
| MD5 |
f7b0f5c558b12ac6d9909a99e6af1cf0
|
|
| BLAKE2b-256 |
473543ef1ae152760fda89783864006770744b350e64e3d6b33ef42e14501147
|