Skip to main content

WordList For Hacking — Unified wordlist generation toolkit for pentest and red team operations

Project description

WordListsForHacking (WFH)

GitHub Stars License Version Python 3.8+ PyPI

Unified wordlist generation toolkit for pentest and red team operations — 44 subcommands in a single CLI. Charset/mask generation, personal & corporate target profiling, web scraping (JS/CSS/PDF extraction), OCR, document parsing (PDF/XLSX/DOCX), leet speak permutations, XOR crypto, DNS/subdomain fuzzing, phone number generation, corporate user enumeration, retail/pharmacy chain credential patterns, default credential databases (IoT/ICS/SCADA/PLC/HMI), ISP WiFi keyspace generation, password-DNA behavioral analysis, keyword combiner, word mangling, merge & sanitize, ML-based ranking with SecLists corpus training, statistical analysis, PCFG probabilistic grammar generation, OMEN-style Markov chain generation, keyboard walk generation, automatic hashcat rule generation, PRINCE combinatorial chaining, wordlist quality benchmarking, phrase-initials acrostic generation, existing-password mutation engine, digit-to-text variants (EN/PT/BR/ES), global length filters, and disk-space safety checks.

Full documentation: Wiki


DISCLAIMER: This tool is intended exclusively for authorized security testing, penetration testing, and educational purposes. Unauthorized use against systems you do not own or have explicit written permission to test is illegal and unethical. The author assumes no liability for misuse.


Quick Start

Install via pip (optional)

pip install wfh-wordlist                # core
pip install wfh-wordlist[full]         # all extras (OCR, docs, scrape)

Clone from source

git clone https://github.com/mrhenrike/WordListsForHacking.git
cd WordListsForHacking

pip install -r requirements.txt pyyaml

# Linux / macOS / Termux (optional venv)
chmod +x setup_venv.sh && ./setup_venv.sh && source .venv/bin/activate

# Windows PowerShell
.\setup_venv.ps1; .\.venv\Scripts\Activate.ps1

Run

python wfh.py              # interactive menu
python wfh.py --help       # full CLI help (44 subcommands)
wfh --help                 # after pip install

Packaging: PyPI package wfh-wordlist mirrors this repo. Local pyproject.toml is not tracked in git; copy pyproject.toml.example for editable installs.

OS prerequisites (OCR only): see the Installation wiki page.


Subcommands

# Command Description
1 charset Charset/mask generation (crunch-style + hashcat masks)
2 pattern Template-based generation with variables
3 profile Personal target profiling (CUPP-style)
4 corp Corporate target profiling
5 corp-users Corporate domain user/password generation (50+ patterns)
6 phone Phone number wordlists (BR, US, UK)
7 scrape Web scraping (CeWL/CeWLeR-style) with JS/CSS/PDF extraction
8 ocr OCR text extraction from images
9 extract Extract words from PDF/XLSX/DOCX
10 leet Leet speak permutations
11 xor XOR encrypt/decrypt/brute-force
12 analyze Statistical analysis (pipal-style)
13 merge Merge & deduplicate wordlists
14 dns DNS/subdomain fuzzing (alterx-style)
15 pharma Healthcare/pharmacy credential patterns
16 sanitize Clean & normalize wordlists
17 reverse Reverse line order
18 corp-prefixes Corporate prefix usernames (MSP/SOC/DevOps)
19 train Train ML pattern model (local + SecLists corpus)
20 sysinfo Hardware & compute info
21 mangle Word mangling rules
22 default-creds Query default credentials database (IoT/routers/printers/ICS)
23 isp-keygen ISP default WiFi password keyspace generator
24 combiner Keyword combiner (intelligence-wordlist-generator style)
25 password-dna Analyze password patterns and generate behavioral variants
26 pcfg PCFG probabilistic grammar — train and generate (Weir et al.)
27 markov OMEN-style positional Markov chain generator
28 kwalk Keyboard walk password generator (kwprocessor-style)
29 rulegen Auto-generate hashcat .rule files from password analysis
30 benchmark Wordlist quality benchmarking (MAYA-inspired metrics)
31 prince PRINCE attack — chained element combination
32 phrase Phrase-initials acrostic password generator (@0x90 / hacker-suffix style)
33 mutate Existing-password mutation engine (case / leet / prefix / suffix)
34 pharma Retail/pharmacy chain credential patterns (brand+id, system+taxid, usernames)
35 br-names Brazilian name-based username generator
36 num2text Digit-to-text wordlist generator with case/leet/separator variants (EN/PT/BR/ES)

Detailed syntax and examples for each subcommand: Wiki — Subcommands

Global Flags

python wfh.py --threads 20 --compute cuda --no-ml --min-len 8 --max-len 20 <subcommand>
Flag Default Description
--threads N 5 Thread count (1–300)
--compute MODE auto auto / cpu / gpu / cuda / rocm / mps / hybrid
--no-ml off Disable ML ranking
--min-len N 0 Global minimum word length filter (applied to all commands)
--max-len N 0 Global maximum word length filter (applied to all commands)
-v off Verbose logging

Common Usage Examples

Corporate pentest — generate users + passwords

python wfh.py corp-users --domain acme.com.br --file employees.txt --passwords --combo -o acme_combo.lst

Personal target profiling

python wfh.py profile --name "João Silva" --nick joao --birth 15/03/1990 --leet aggressive -o target.lst

Charset with hashcat mask

python wfh.py charset 8 8 --mask "?u?l?l?l?d?d?d?s" -o passwords.lst

Template-based patterns

python wfh.py pattern -t "{company}{year}!" --vars company=acme,globex year=2020-2026 -o patterns.lst

DNS subdomain fuzzing

python wfh.py dns -d acme.com.br --words dev staging api admin portal -o subdomains.lst

Analyze an existing wordlist

python wfh.py analyze passwords.lst --top 30 --masks --format json -o analysis.json

Default credentials lookup

python wfh.py default-creds --list-vendors
python wfh.py default-creds --vendor mikrotik --format combo -o mikrotik_creds.lst
python wfh.py default-creds --protocol snmp --format user -o snmp_users.lst

ISP WiFi keyspace generation

python wfh.py isp-keygen --list
python wfh.py isp-keygen --isp xfinity_comcast --estimate
python wfh.py isp-keygen --isp xfinity_comcast --limit 100000 -o xfinity.lst

Web scraping with JS/CSS/PDF

python wfh.py scrape https://target.com --include-js --include-css --include-pdf --lowercase -o words.lst
python wfh.py scrape https://target.com --emails --output-emails emails.txt --output-urls urls.txt
python wfh.py scrape https://target.com --subdomain-strategy children --stream -o stream.lst

Merge & sanitize

python wfh.py merge list1.lst list2.lst --min-len 6 --sort -o merged.lst
python wfh.py sanitize merged.lst --inplace

More examples and scenarios: Wiki — Quick Start


Password DNA

Analyze password patterns and generate behavioral variants. The password-dna subcommand extracts structural "DNA" from known passwords (uppercase, lowercase, digit, symbol positions) and produces new candidates that follow the same behavioral patterns.

# Analyze a leaked/known password list and generate variants
python wfh.py password-dna --input known_passwords.lst --depth 2 -o dna_variants.lst

# Generate variants from a single seed with aggressive expansion
python wfh.py password-dna --seed "Company2024!" --depth 3 --leet -o seed_variants.lst

# DNA analysis report only (no generation)
python wfh.py password-dna --input known_passwords.lst --analyze-only --format json -o dna_report.json

PCFG Grammar Engine

Train a Probabilistic Context-Free Grammar from a password corpus and generate candidates in probability order (most likely first). Based on Weir et al. (IEEE S&P 2009).

# Train a grammar from a password corpus
python wfh.py pcfg train --wordlist rockyou.txt

# Generate candidates (probability-ordered)
python wfh.py pcfg generate -o candidates.lst --limit 1000000

# Fine-tune with structure/terminal limits
python wfh.py pcfg generate --top-structures 50 --top-terminals 100 --min-len 8

Markov Chain Generator

OMEN-style positional Markov chain generator. Learns per-position character transitions and generates in ascending cost order (most probable first).

# Train a Markov model (order 3)
python wfh.py markov train --wordlist leaked.txt --order 3

# Generate candidates with cost threshold
python wfh.py markov generate --min-len 6 --max-len 12 --max-cost 30 --limit 500000

Keyboard Walk Generator

Generate passwords based on physical keyboard adjacency walks. Supports QWERTY, AZERTY, QWERTZ, Dvorak, and numpad layouts.

# Generate QWERTY walks (length 4-10)
python wfh.py kwalk --min-len 4 --max-len 10 -o walks.lst

# Multiple layouts, no shift layer
python wfh.py kwalk --layout qwerty,numpad --no-shift --max-changes 2

# List available layouts
python wfh.py kwalk --list-layouts

Hashcat Rule Auto-Generation

Analyze real passwords and automatically generate hashcat-compatible .rule files by reverse-engineering transformation patterns.

# Generate a .rule file from password analysis
python wfh.py rulegen --wordlist leaked.txt -o rules.rule --top-rules 200

# With a dictionary for better base-word matching
python wfh.py rulegen --wordlist passwords.lst --dictionary english.txt -o optimized.rule

PRINCE Attack Mode

PRINCE (PRobability INfinite Chained Elements) generates passwords by combining multiple words from a wordlist. Discovers multi-word passwords like correcthorsebatterystaple.

# Chain 2-4 elements from a base wordlist
python wfh.py prince --wordlist top1000.txt --min-elem 2 --max-elem 4 -o prince.lst

# With separator and case permutations
python wfh.py prince --wordlist words.txt --separator "-" --case-permute --min-len 8

Wordlist Quality Benchmark

Measure the effectiveness of a generated wordlist against a reference set. Reports hit rate, efficiency, diversity, coverage by length/charset, and estimated crack times.

# Benchmark a wordlist against a known password set
python wfh.py benchmark --wordlist generated.lst --reference rockyou_sample.txt

# Save JSON report
python wfh.py benchmark --wordlist output.lst --reference test_set.txt --json report.json

Default Credentials Database

Query the built-in database of 1,329+ factory-default credentials covering 88 vendors and 14 protocols — routers, switches, printers, IP cameras, ICS/SCADA (PLCs, HMIs, RTUs), IoT gateways, and more.

# List all supported vendors
python wfh.py default-creds --list-vendors

# Export credentials for a specific vendor
python wfh.py default-creds --vendor siemens --format combo -o siemens_creds.lst

# Filter by protocol (telnet, ssh, http, snmp, modbus, s7comm, etc.)
python wfh.py default-creds --protocol modbus --format user -o modbus_users.lst

# Search by device category
python wfh.py default-creds --category ics --format combo -o ics_defaults.lst

# Export full database as JSON
python wfh.py default-creds --export-all --format json -o all_defaults.json

Wordlists

File Description Entries
passwords/wlist_brasil.lst Brazilian password corpus — cultural word banks, corporate patterns, leet speak, keyboard walks. Company names and CNPJs are public OSINT data. ~3.88M
passwords/default-creds-combo.lst Default credential user:password combos (routers, printers, ICS/SCADA) ~3K
data/default_credentials.json Structured default credentials database (1,329 entries, 88 vendors, 14 protocols)
fuzzing/discovery_br.lst Brazilian web discovery & API fuzzing paths ~900
usernames/username_br.lst Brazilian + global username patterns ~1.6K
labs/*.lst Workshop & training wordlists

Details: Wiki — Brazilian Wordlist


Is My Password in This List?

# Linux/macOS
grep -qxF 'YourPassword' passwords/wlist_brasil.lst && echo "FOUND!" || echo "Not found"

# Windows PowerShell
Select-String -Path passwords\wlist_brasil.lst -Pattern '^YourPassword$' -SimpleMatch -Quiet

If found: change it immediately, enable MFA/2FA, use a password manager, and never reuse passwords.

Full guide: Wiki — Password Check


ML Model

WFH includes a lightweight ML model that ranks generated candidates by structural pattern probability. Train it with local data or the SecLists corpus:

python wfh.py train --auto                    # local wordlists only
python wfh.py train --seclists                # SecLists corpus (auto-discover)
python wfh.py train --auto --seclists         # combined (recommended)
python wfh.py train --seclists /path/to/SecLists --seclists-categories password frequency

The model stores only structural patterns — no PII, passwords, or company names.

Details: Wiki — ML Model


New in v2.6 — Additional Generators

Phrase-Initials Password Generator

Generate passwords from the first letter of each word in a phrase, with case mutations, leet substitutions, and hacker-style suffixes.

# Phrase → acrostic + variations
python wfh.py phrase "my secret corporate phrase" -o phrase.lst

# With custom prefixes and suffixes
python wfh.py phrase "my secret corporate phrase" --prefixes _,__ --suffixes @0x90,#0x90 -o phrase.lst

Existing Password Mutation Engine

Generate an exhaustive set of variants from an existing base password.

# Mutate a known password
python wfh.py mutate "Summer2024" -o mutated.lst

# Control leet depth and length range
python wfh.py mutate "password123" --leet-mode aggressive --min-len 10 --max-len 25 -o mutated.lst

Retail / Pharmacy Chain Credential Generator

Generates passwords and usernames following patterns common in retail environments: brand + store-id, system + tax-id, internal login prefixes.

# Both passwords and usernames for a brand
python wfh.py pharma --brand AcmePharma --ids 1200-1210 -o pharma.lst

# Passwords only, with tax ID (CNPJ)
python wfh.py pharma --brand RetailCo --abbrevs RC,RET --cnpj 01234567890123 --mode passwords

# Usernames only, custom domain
python wfh.py pharma --brand BrandX --ids 1000-2000 --domains corp.com.br --mode usernames

Digit-to-Text Wordlist Generator

Converts numbers (up to 12 digits) into their text word representations with full variant generation. Supports EN, PT, BR (with feminine forms), and ES.

# Single number in English (default)
python wfh.py num2text --number 123
# → onetwothree, ONETWOTHREE, OneTwoThree, 0n37w07hr33, one-two-three, ...

# Brazilian Portuguese (includes feminine variants: uma, duas)
python wfh.py num2text --number 12 --lang br
# → umdois, umaduas, Um-Duas, um_duas, ...

# Spanish
python wfh.py num2text --number 123 --lang es
# → unodostres, UNODOSTRES, uno-dos-tres, una-dos-tres, ...

# Batch range, saved to file
python wfh.py num2text --range 0-9999 --lang en -o number_words.lst
python wfh.py num2text --range 2000-2030 --lang pt -o years_pt.lst

Accepted --lang aliases:

Code Also accepts Language
en en-us, en-gb English (default)
pt pt-pt European Portuguese
br pt-br Brazilian Portuguese
es es-es, es-mx, es-la Spanish

Global Length Filters

Apply minimum/maximum word length filtering to any subcommand output.

python wfh.py --min-len 8 --max-len 20 charset 8 12 -o filtered.lst
python wfh.py --min-len 10 mutate "admin" -o long_variants.lst

Credits & Inspiration

Project Inspiration
CUPP Personal target profiling
Crunch Charset-based generation
CeWL Web scraping for wordlists
CeWLeR Modern Python web scraping (JS/CSS/PDF)
routersploit Default credentials for IoT/routers
alterx DNS/subdomain fuzzing
pipal Statistical analysis
SecLists Curated security lists
elpscrk Permutation-based generation
BEWGor Biographical wordlist generator
pnwgen Phone number generation
intelligence-wordlist-generator Keyword combiner
SCaDAPass ICS/SCADA default credentials
pcfg_cracker PCFG probabilistic grammar (Weir et al.)
OMEN Ordered Markov ENumerator
kwprocessor Keyboard walk generation
PACK Password Analysis and Cracking Kit (rulegen)
princeprocessor PRINCE attack mode
MAYA Wordlist quality benchmarking framework

Contact

Contributing

Contributions welcome. See CONTRIBUTING.md.

License

MIT License — Copyright (c) 2026 André Henrique (@mrhenrike)


Created by André Henrique (@mrhenrike)União Geek
suporte@uniaogeek.com.br

Leia em Português · Full Documentation (Wiki)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wfh_wordlist-2.7.0.tar.gz (319.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wfh_wordlist-2.7.0-py3-none-any.whl (340.0 kB view details)

Uploaded Python 3

File details

Details for the file wfh_wordlist-2.7.0.tar.gz.

File metadata

  • Download URL: wfh_wordlist-2.7.0.tar.gz
  • Upload date:
  • Size: 319.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for wfh_wordlist-2.7.0.tar.gz
Algorithm Hash digest
SHA256 d2249c36e70f9e9f828b86c5f4a9285006f94ccc3f243b42c3b4b5ee092a21ad
MD5 2255ff7b5fdbed9cdea2dc22cddc03bf
BLAKE2b-256 dcd0ab5347fc7420ba0f3cc56e06ef4a777c32264fe1962b64edcbab58d102a0

See more details on using hashes here.

File details

Details for the file wfh_wordlist-2.7.0-py3-none-any.whl.

File metadata

  • Download URL: wfh_wordlist-2.7.0-py3-none-any.whl
  • Upload date:
  • Size: 340.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for wfh_wordlist-2.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 83a84d8cc55680bcfe9b50f394652b6de6cd1c8c8ee4adeff0021b4f0dd80b46
MD5 fb94f62e7b0b938eaa8a9028c7f2cded
BLAKE2b-256 b1489325011ae09c5a80eb5fe3f9d9e1cb4434ae15b0a3f8f331b22da6ea95bd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page