WordList For Hacking — Unified wordlist generation toolkit for pentest and red team operations
Project description
WordListsForHacking (WFH)
Unified wordlist generation toolkit for pentest and red team operations — 44 subcommands in a single CLI. Charset/mask generation, personal & corporate target profiling, web scraping (JS/CSS/PDF extraction), OCR, document parsing (PDF/XLSX/DOCX), leet speak permutations, XOR crypto, DNS/subdomain fuzzing, phone number generation, corporate user enumeration, retail/pharmacy chain credential patterns, default credential databases (IoT/ICS/SCADA/PLC/HMI), ISP WiFi keyspace generation, password-DNA behavioral analysis, keyword combiner, word mangling, merge & sanitize, ML-based ranking with SecLists corpus training, statistical analysis, PCFG probabilistic grammar generation, OMEN-style Markov chain generation, keyboard walk generation, automatic hashcat rule generation, PRINCE combinatorial chaining, wordlist quality benchmarking, phrase-initials acrostic generation, existing-password mutation engine, digit-to-text variants (EN/PT/BR/ES), global length filters, and disk-space safety checks.
Full documentation: Wiki
DISCLAIMER: This tool is intended exclusively for authorized security testing, penetration testing, and educational purposes. Unauthorized use against systems you do not own or have explicit written permission to test is illegal and unethical. The author assumes no liability for misuse.
Quick Start
Install via pip (optional)
pip install wfh-wordlist # core
pip install wfh-wordlist[full] # all extras (OCR, docs, scrape)
Clone from source
git clone https://github.com/mrhenrike/WordListsForHacking.git
cd WordListsForHacking
pip install -r requirements.txt pyyaml
# Linux / macOS / Termux (optional venv)
chmod +x setup_venv.sh && ./setup_venv.sh && source .venv/bin/activate
# Windows PowerShell
.\setup_venv.ps1; .\.venv\Scripts\Activate.ps1
Run
python wfh.py # interactive menu
python wfh.py --help # full CLI help (44 subcommands)
wfh --help # after pip install
Packaging: PyPI package
wfh-wordlistmirrors this repo. Localpyproject.tomlis not tracked in git; copypyproject.toml.examplefor editable installs.
OS prerequisites (OCR only): see the Installation wiki page.
Subcommands
| # | Command | Description |
|---|---|---|
| 1 | charset |
Charset/mask generation (crunch-style + hashcat masks) |
| 2 | pattern |
Template-based generation with variables |
| 3 | profile |
Personal target profiling (CUPP-style) |
| 4 | corp |
Corporate target profiling |
| 5 | corp-users |
Corporate domain user/password generation (50+ patterns) |
| 6 | phone |
Phone number wordlists (BR, US, UK) |
| 7 | scrape |
Web scraping (CeWL/CeWLeR-style) with JS/CSS/PDF extraction |
| 8 | ocr |
OCR text extraction from images |
| 9 | extract |
Extract words from PDF/XLSX/DOCX |
| 10 | leet |
Leet speak permutations |
| 11 | xor |
XOR encrypt/decrypt/brute-force |
| 12 | analyze |
Statistical analysis (pipal-style) |
| 13 | merge |
Merge & deduplicate wordlists |
| 14 | dns |
DNS/subdomain fuzzing (alterx-style) |
| 15 | pharma |
Healthcare/pharmacy credential patterns |
| 16 | sanitize |
Clean & normalize wordlists |
| 17 | reverse |
Reverse line order |
| 18 | corp-prefixes |
Corporate prefix usernames (MSP/SOC/DevOps) |
| 19 | train |
Train ML pattern model (local + SecLists corpus) |
| 20 | sysinfo |
Hardware & compute info |
| 21 | mangle |
Word mangling rules |
| 22 | default-creds |
Query default credentials database (IoT/routers/printers/ICS) |
| 23 | isp-keygen |
ISP default WiFi password keyspace generator |
| 24 | combiner |
Keyword combiner (intelligence-wordlist-generator style) |
| 25 | password-dna |
Analyze password patterns and generate behavioral variants |
| 26 | pcfg |
PCFG probabilistic grammar — train and generate (Weir et al.) |
| 27 | markov |
OMEN-style positional Markov chain generator |
| 28 | kwalk |
Keyboard walk password generator (kwprocessor-style) |
| 29 | rulegen |
Auto-generate hashcat .rule files from password analysis |
| 30 | benchmark |
Wordlist quality benchmarking (MAYA-inspired metrics) |
| 31 | prince |
PRINCE attack — chained element combination |
| 32 | phrase |
Phrase-initials acrostic password generator (@0x90 / hacker-suffix style) |
| 33 | mutate |
Existing-password mutation engine (case / leet / prefix / suffix) |
| 34 | pharma |
Retail/pharmacy chain credential patterns (brand+id, system+taxid, usernames) |
| 35 | br-names |
Brazilian name-based username generator |
| 36 | num2text |
Digit-to-text wordlist generator with case/leet/separator variants (EN/PT/BR/ES) |
Detailed syntax and examples for each subcommand: Wiki — Subcommands
Global Flags
python wfh.py --threads 20 --compute cuda --no-ml --min-len 8 --max-len 20 <subcommand>
| Flag | Default | Description |
|---|---|---|
--threads N |
5 |
Thread count (1–300) |
--compute MODE |
auto |
auto / cpu / gpu / cuda / rocm / mps / hybrid |
--no-ml |
off | Disable ML ranking |
--min-len N |
0 |
Global minimum word length filter (applied to all commands) |
--max-len N |
0 |
Global maximum word length filter (applied to all commands) |
-v |
off | Verbose logging |
Common Usage Examples
Corporate pentest — generate users + passwords
python wfh.py corp-users --domain acme.com.br --file employees.txt --passwords --combo -o acme_combo.lst
Personal target profiling
python wfh.py profile --name "João Silva" --nick joao --birth 15/03/1990 --leet aggressive -o target.lst
Charset with hashcat mask
python wfh.py charset 8 8 --mask "?u?l?l?l?d?d?d?s" -o passwords.lst
Template-based patterns
python wfh.py pattern -t "{company}{year}!" --vars company=acme,globex year=2020-2026 -o patterns.lst
DNS subdomain fuzzing
python wfh.py dns -d acme.com.br --words dev staging api admin portal -o subdomains.lst
Analyze an existing wordlist
python wfh.py analyze passwords.lst --top 30 --masks --format json -o analysis.json
Default credentials lookup
python wfh.py default-creds --list-vendors
python wfh.py default-creds --vendor mikrotik --format combo -o mikrotik_creds.lst
python wfh.py default-creds --protocol snmp --format user -o snmp_users.lst
ISP WiFi keyspace generation
python wfh.py isp-keygen --list
python wfh.py isp-keygen --isp xfinity_comcast --estimate
python wfh.py isp-keygen --isp xfinity_comcast --limit 100000 -o xfinity.lst
Web scraping with JS/CSS/PDF
python wfh.py scrape https://target.com --include-js --include-css --include-pdf --lowercase -o words.lst
python wfh.py scrape https://target.com --emails --output-emails emails.txt --output-urls urls.txt
python wfh.py scrape https://target.com --subdomain-strategy children --stream -o stream.lst
Merge & sanitize
python wfh.py merge list1.lst list2.lst --min-len 6 --sort -o merged.lst
python wfh.py sanitize merged.lst --inplace
More examples and scenarios: Wiki — Quick Start
Password DNA
Analyze password patterns and generate behavioral variants. The password-dna subcommand extracts structural "DNA" from known passwords (uppercase, lowercase, digit, symbol positions) and produces new candidates that follow the same behavioral patterns.
# Analyze a leaked/known password list and generate variants
python wfh.py password-dna --input known_passwords.lst --depth 2 -o dna_variants.lst
# Generate variants from a single seed with aggressive expansion
python wfh.py password-dna --seed "Company2024!" --depth 3 --leet -o seed_variants.lst
# DNA analysis report only (no generation)
python wfh.py password-dna --input known_passwords.lst --analyze-only --format json -o dna_report.json
PCFG Grammar Engine
Train a Probabilistic Context-Free Grammar from a password corpus and generate candidates in probability order (most likely first). Based on Weir et al. (IEEE S&P 2009).
# Train a grammar from a password corpus
python wfh.py pcfg train --wordlist rockyou.txt
# Generate candidates (probability-ordered)
python wfh.py pcfg generate -o candidates.lst --limit 1000000
# Fine-tune with structure/terminal limits
python wfh.py pcfg generate --top-structures 50 --top-terminals 100 --min-len 8
Markov Chain Generator
OMEN-style positional Markov chain generator. Learns per-position character transitions and generates in ascending cost order (most probable first).
# Train a Markov model (order 3)
python wfh.py markov train --wordlist leaked.txt --order 3
# Generate candidates with cost threshold
python wfh.py markov generate --min-len 6 --max-len 12 --max-cost 30 --limit 500000
Keyboard Walk Generator
Generate passwords based on physical keyboard adjacency walks. Supports QWERTY, AZERTY, QWERTZ, Dvorak, and numpad layouts.
# Generate QWERTY walks (length 4-10)
python wfh.py kwalk --min-len 4 --max-len 10 -o walks.lst
# Multiple layouts, no shift layer
python wfh.py kwalk --layout qwerty,numpad --no-shift --max-changes 2
# List available layouts
python wfh.py kwalk --list-layouts
Hashcat Rule Auto-Generation
Analyze real passwords and automatically generate hashcat-compatible .rule files by reverse-engineering transformation patterns.
# Generate a .rule file from password analysis
python wfh.py rulegen --wordlist leaked.txt -o rules.rule --top-rules 200
# With a dictionary for better base-word matching
python wfh.py rulegen --wordlist passwords.lst --dictionary english.txt -o optimized.rule
PRINCE Attack Mode
PRINCE (PRobability INfinite Chained Elements) generates passwords by combining multiple words from a wordlist. Discovers multi-word passwords like correcthorsebatterystaple.
# Chain 2-4 elements from a base wordlist
python wfh.py prince --wordlist top1000.txt --min-elem 2 --max-elem 4 -o prince.lst
# With separator and case permutations
python wfh.py prince --wordlist words.txt --separator "-" --case-permute --min-len 8
Wordlist Quality Benchmark
Measure the effectiveness of a generated wordlist against a reference set. Reports hit rate, efficiency, diversity, coverage by length/charset, and estimated crack times.
# Benchmark a wordlist against a known password set
python wfh.py benchmark --wordlist generated.lst --reference rockyou_sample.txt
# Save JSON report
python wfh.py benchmark --wordlist output.lst --reference test_set.txt --json report.json
Default Credentials Database
Query the built-in database of 1,329+ factory-default credentials covering 88 vendors and 14 protocols — routers, switches, printers, IP cameras, ICS/SCADA (PLCs, HMIs, RTUs), IoT gateways, and more.
# List all supported vendors
python wfh.py default-creds --list-vendors
# Export credentials for a specific vendor
python wfh.py default-creds --vendor siemens --format combo -o siemens_creds.lst
# Filter by protocol (telnet, ssh, http, snmp, modbus, s7comm, etc.)
python wfh.py default-creds --protocol modbus --format user -o modbus_users.lst
# Search by device category
python wfh.py default-creds --category ics --format combo -o ics_defaults.lst
# Export full database as JSON
python wfh.py default-creds --export-all --format json -o all_defaults.json
Wordlists
| File | Description | Entries |
|---|---|---|
passwords/wlist_brasil.lst |
Brazilian password corpus — cultural word banks, corporate patterns, leet speak, keyboard walks. Company names and CNPJs are public OSINT data. | ~3.88M |
passwords/default-creds-combo.lst |
Default credential user:password combos (routers, printers, ICS/SCADA) | ~3K |
data/default_credentials.json |
Structured default credentials database (1,329 entries, 88 vendors, 14 protocols) | — |
fuzzing/discovery_br.lst |
Brazilian web discovery & API fuzzing paths | ~900 |
usernames/username_br.lst |
Brazilian + global username patterns | ~1.6K |
labs/*.lst |
Workshop & training wordlists | — |
Details: Wiki — Brazilian Wordlist
Is My Password in This List?
# Linux/macOS
grep -qxF 'YourPassword' passwords/wlist_brasil.lst && echo "FOUND!" || echo "Not found"
# Windows PowerShell
Select-String -Path passwords\wlist_brasil.lst -Pattern '^YourPassword$' -SimpleMatch -Quiet
If found: change it immediately, enable MFA/2FA, use a password manager, and never reuse passwords.
Full guide: Wiki — Password Check
ML Model
WFH includes a lightweight ML model that ranks generated candidates by structural pattern probability. Train it with local data or the SecLists corpus:
python wfh.py train --auto # local wordlists only
python wfh.py train --seclists # SecLists corpus (auto-discover)
python wfh.py train --auto --seclists # combined (recommended)
python wfh.py train --seclists /path/to/SecLists --seclists-categories password frequency
The model stores only structural patterns — no PII, passwords, or company names.
Details: Wiki — ML Model
New in v2.6 — Additional Generators
Phrase-Initials Password Generator
Generate passwords from the first letter of each word in a phrase, with case mutations, leet substitutions, and hacker-style suffixes.
# Phrase → acrostic + variations
python wfh.py phrase "my secret corporate phrase" -o phrase.lst
# With custom prefixes and suffixes
python wfh.py phrase "my secret corporate phrase" --prefixes _,__ --suffixes @0x90,#0x90 -o phrase.lst
Existing Password Mutation Engine
Generate an exhaustive set of variants from an existing base password.
# Mutate a known password
python wfh.py mutate "Summer2024" -o mutated.lst
# Control leet depth and length range
python wfh.py mutate "password123" --leet-mode aggressive --min-len 10 --max-len 25 -o mutated.lst
Retail / Pharmacy Chain Credential Generator
Generates passwords and usernames following patterns common in retail environments: brand + store-id, system + tax-id, internal login prefixes.
# Both passwords and usernames for a brand
python wfh.py pharma --brand AcmePharma --ids 1200-1210 -o pharma.lst
# Passwords only, with tax ID (CNPJ)
python wfh.py pharma --brand RetailCo --abbrevs RC,RET --cnpj 01234567890123 --mode passwords
# Usernames only, custom domain
python wfh.py pharma --brand BrandX --ids 1000-2000 --domains corp.com.br --mode usernames
Digit-to-Text Wordlist Generator
Converts numbers (up to 12 digits) into their text word representations with full variant generation. Supports EN, PT, BR (with feminine forms), and ES.
# Single number in English (default)
python wfh.py num2text --number 123
# → onetwothree, ONETWOTHREE, OneTwoThree, 0n37w07hr33, one-two-three, ...
# Brazilian Portuguese (includes feminine variants: uma, duas)
python wfh.py num2text --number 12 --lang br
# → umdois, umaduas, Um-Duas, um_duas, ...
# Spanish
python wfh.py num2text --number 123 --lang es
# → unodostres, UNODOSTRES, uno-dos-tres, una-dos-tres, ...
# Batch range, saved to file
python wfh.py num2text --range 0-9999 --lang en -o number_words.lst
python wfh.py num2text --range 2000-2030 --lang pt -o years_pt.lst
Accepted --lang aliases:
| Code | Also accepts | Language |
|---|---|---|
en |
en-us, en-gb |
English (default) |
pt |
pt-pt |
European Portuguese |
br |
pt-br |
Brazilian Portuguese |
es |
es-es, es-mx, es-la |
Spanish |
Global Length Filters
Apply minimum/maximum word length filtering to any subcommand output.
python wfh.py --min-len 8 --max-len 20 charset 8 12 -o filtered.lst
python wfh.py --min-len 10 mutate "admin" -o long_variants.lst
Credits & Inspiration
| Project | Inspiration |
|---|---|
| CUPP | Personal target profiling |
| Crunch | Charset-based generation |
| CeWL | Web scraping for wordlists |
| CeWLeR | Modern Python web scraping (JS/CSS/PDF) |
| routersploit | Default credentials for IoT/routers |
| alterx | DNS/subdomain fuzzing |
| pipal | Statistical analysis |
| SecLists | Curated security lists |
| elpscrk | Permutation-based generation |
| BEWGor | Biographical wordlist generator |
| pnwgen | Phone number generation |
| intelligence-wordlist-generator | Keyword combiner |
| SCaDAPass | ICS/SCADA default credentials |
| pcfg_cracker | PCFG probabilistic grammar (Weir et al.) |
| OMEN | Ordered Markov ENumerator |
| kwprocessor | Keyboard walk generation |
| PACK | Password Analysis and Cracking Kit (rulegen) |
| princeprocessor | PRINCE attack mode |
| MAYA | Wordlist quality benchmarking framework |
Contact
- Support / general inquiries: suporte@uniaogeek.com.br
- Security issues: SECURITY.md
- Organization: União Geek
Contributing
Contributions welcome. See CONTRIBUTING.md.
License
MIT License — Copyright (c) 2026 André Henrique (@mrhenrike)
Created by André Henrique (@mrhenrike) — União Geek
suporte@uniaogeek.com.br
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wfh_wordlist-2.7.2.tar.gz.
File metadata
- Download URL: wfh_wordlist-2.7.2.tar.gz
- Upload date:
- Size: 321.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
34c64597f29e255e6a942f98c1c814671f8e930c35db843d176e8eb28107f870
|
|
| MD5 |
6190b9ae0851514e32944f25681342c7
|
|
| BLAKE2b-256 |
e2b4337e45e65f776c3bb6bc15d467984d8254fe06f5841e3b52b7e39d402ca9
|
File details
Details for the file wfh_wordlist-2.7.2-py3-none-any.whl.
File metadata
- Download URL: wfh_wordlist-2.7.2-py3-none-any.whl
- Upload date:
- Size: 341.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d82e70d7056f62857c8fc6f602e6368a3a8b2f520d395d01df0dc3d26d2ef749
|
|
| MD5 |
a5b6316bfef6dc64f617e6e2a3d58fcd
|
|
| BLAKE2b-256 |
a31ce842e2703d5a9fab31ca09b6248a6006c062080978ce05cdd182b2e4e5eb
|