Skip to main content

Wordlists forged for your target, not for everyone's. Hyper-contextual wordlist generation for offensive security.

Project description

WordForge

WordForge

Wordlists forged for your target, not for everyone's.

CI License: AGPL-3.0 Python 3.12+ Coverage Type-checked Stars


Why WordForge?

Generic wordlists like rockyou, SecLists/discovery, or common.txt are noisy against specific targets. Throwing 200,000 generic passwords at an enterprise login is inefficient and loud.

WordForge generates hyper-contextual wordlists from passive OSINT of your target: their website, GitHub orgs, public docs, employee profiles, and historical content. The result is wordlists that are an order of magnitude more relevant than generic ones — usernames realistic to the company, directory paths matching the actual stack, parameters matching internal naming, and subdomains rooted in real codenames.

Designed as a companion to SubSift: same stack, same philosophy, pipe-friendly.

Features

  • Passive OSINT collection — website crawler, GitHub org metadata, Wayback Machine, DNS, response headers — all async, rate-limited, polite.
  • NER + pattern extraction — spaCy identifies people, organizations, products, locations; heuristics surface internal jargon and codenames.
  • Multi-provider LLM — Ollama (local, free), Anthropic (premium), or OpenAI. Switch at runtime from the UI or CLI.
  • HashCat-style mutations — capitalization, leet, year suffixes, role combinations, configurable rules.
  • Categorized output — usernames, passwords, paths, subdomains, parameters, emails, company name variants.
  • Pipe-friendlywordforge generate ... | subsift scan.
  • Live dashboard — the web UI streams per-stage progress over SSE (collect → extract → generate → mutate), then renders a tabbed result card with copy / download / pipe-to-SubSift actions.
  • Operator seeds--seed employees.txt folds LinkedIn-derived names straight into the LLM context and the offline fallback.
  • Personal OPSEC awarenesswordforge person --username … --github … gathers an employee's public footprint (handle enumeration across ~25 platforms, GitHub, supplied URLs, pasted --text), derives the passwords an attacker would guess, and checks them against HaveIBeenPwned — so awareness training can say "295 of these are already in breaches." For authorized engagements.
  • Self-diagnosticwordforge doctor reports DNS, HTTPS, LLM provider, spaCy model, cache, and DB readiness in one shot.
  • One-command Dockerdocker compose up and you're scanning.

LLM providers

WordForge supports three LLM providers. Choose the trade-off that fits.

Provider Privacy Cost Quality Setup
Ollama (default) 🟢 Local-only 🟢 Free 🟡 Good Run ollama serve; default model llama3.1:8b
Anthropic Claude 🔴 API call 🟡 Pay-per-token 🟢 Excellent Set ANTHROPIC_API_KEY; default model claude-sonnet-4-5
OpenAI 🔴 API call 🟡 Pay-per-token 🟢 Excellent Set OPENAI_API_KEY; default model gpt-4o-mini

Set WORDFORGE_LLM_PROVIDER=ollama|anthropic|openai in .env, or switch on the fly with --provider on the CLI, or click the provider icon in the web UI.

Quickstart

Install as a tool from PyPI (easiest — nothing to clone):

uv tool install wordforge      # or: pipx install wordforge
wordforge models download      # one-time: fetch the spaCy NER model
wordforge doctor
wordforge generate example.com

Or run the whole stack with Docker:

git clone https://github.com/Ataraxia-ia-labs/WordForge.git
cd WordForge
cp .env.example .env       # edit if you want non-default provider
docker compose up --build  # web UI on http://localhost:8001

Or from source with uv (the spaCy NER model installs automatically via uv sync):

uv sync --extra dev
uv run wordforge doctor
uv run wordforge generate example.com --provider ollama

New here? The full Usage Guide walks through install, provider setup, your first wordlist, and feeding the output into ffuf / hydra / hashcat / SubSift.

Usage

CLI

# First time? Check your setup is healthy.
wordforge doctor

# Generate all categories for a target
wordforge generate example.com

# Pipe subdomains directly to SubSift
wordforge generate example.com --format subdomains | \
  subsift scan --wordlist - example.com

# Seed with employees you've already collected (LinkedIn export, etc.)
wordforge generate example.com --seed employees.txt

# Choose provider per run
wordforge generate example.com --provider anthropic

# Export to a ZIP bundle
wordforge generate example.com --format zip --output bundle.zip

# Apply a hashcat rule file to the password candidates
wordforge generate example.com --rules best64.rule

# Run a whole list of targets (one per line); failures don't abort the batch
wordforge generate-batch targets.txt

# Compare two runs (or two snapshots of the same target over time)
wordforge diff out/example.com.old out/example.com --show

# Personal OPSEC awareness (authorized): gather a footprint, show the risk
wordforge person --name "Jane Doe" --username jdoe --github jdoe --company acme.com

# ...with a handout HTML report, or a whole roster at once
wordforge person --username jdoe --report jdoe.html
wordforge person-batch employees.csv      # per-employee reports + index.html

# Browse past generations
wordforge list

Web UI

Open http://localhost:8001, enter a target, pick a provider from the selector, click Forge. Stream results in real time, download per category, or grab the ZIP bundle.

Integration with SubSift

wordforge generate target.com --format subdomains | \
  subsift scan --wordlist - target.com

WordForge detects pipes automatically: when stdout is not a TTY, the banner and logs are suppressed, only data goes to stdout.

Configuration

See .env.example for the complete list. Key variables:

Variable Default Description
WORDFORGE_PORT 8001 Web UI / API port
WORDFORGE_LLM_PROVIDER ollama ollama, anthropic, openai
OLLAMA_HOST http://localhost:11434 Ollama endpoint
OLLAMA_MODEL llama3.1:8b Ollama model
ANTHROPIC_MODEL claude-sonnet-4-5 Anthropic model
OPENAI_MODEL gpt-4o-mini OpenAI model
WORDFORGE_RATE_LIMIT_PER_HOST 1.0 Requests/sec per hostname
WORDFORGE_CRAWL_MAX_DEPTH 2 Crawler depth

Architecture

flowchart LR
    A[Target] --> B[Collectors]
    B -->|Website| C[Extractors]
    B -->|GitHub| C
    B -->|Wayback| C
    B -->|DNS| C
    C -->|NER + Patterns| D[LLM Provider]
    D -->|Ollama / Claude / OpenAI| E[Generators]
    E --> F[Mutators]
    F --> G[Exporters]
    G -->|txt / json / zip| H[Wordlists]

Roadmap

Shipped in v0.1.0

  • Async pipeline with 4 collectors (Website BFS+robots, DNS, Wayback CDX, GitHub REST)
  • LLM-driven generators with cached prompts (Ollama / Anthropic / OpenAI)
  • HashCat-style rule engine + case/leet/year/suffix mutators
  • HTMX dashboard with provider selector + recent-runs panel
  • Runtime provider switching from the dashboard
  • Pipe-friendly integration with SubSift
  • wordforge doctor self-diagnostic
  • --seed flag for operator-supplied seed lists (e.g. LinkedIn names)
  • SQLite-backed history (wordforge list)

Shipped in v0.2.0

  • SSE-streamed live progress in the dashboard (per-stage updates)

Shipped (unreleased)

  • PyPI distribution (uv tool install wordforge / pipx) + automated tag releases
  • Multi-target batch mode (wordforge generate-batch targets.txt)
  • Run-diff: compare two run outputs (wordforge diff a/ b/)
  • Hashcat ruleset import (generate --rules best64.rule)

Planned for v0.3

  • Optional API auth (HMAC-signed bearer for /api/generate)
  • Prometheus /metrics endpoint
  • Burp Suite extension (separate repo)
  • Plugin API for custom collectors

Contributing

See CONTRIBUTING.md. Issues and PRs welcome.

Disclaimer

WordForge is for authorized security testing only. Read DISCLAIMER.md before use. Unauthorized scanning may violate computer fraud laws.

License

AGPL-3.0-or-later. If you run a modified version as a network service, you must release your modifications under the same license.

Acknowledgements

Built on the shoulders of: FastAPI, Typer, httpx, trafilatura, spaCy, Ollama, and the broader ProjectDiscovery ecosystem that inspires the pipe-friendly philosophy.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wordforge-0.4.0.tar.gz (522.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wordforge-0.4.0-py3-none-any.whl (170.6 kB view details)

Uploaded Python 3

File details

Details for the file wordforge-0.4.0.tar.gz.

File metadata

  • Download URL: wordforge-0.4.0.tar.gz
  • Upload date:
  • Size: 522.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for wordforge-0.4.0.tar.gz
Algorithm Hash digest
SHA256 21be7f426cc88a0512e560b227a984e93181243f261cb58c3ed19dc5b3b38eba
MD5 22d641d28bac070414886d8cd66ab204
BLAKE2b-256 cd39ca74a6507c9ea3aa99483a9b7c7515789812f92f188908ffccb5df01cba4

See more details on using hashes here.

Provenance

The following attestation bundles were made for wordforge-0.4.0.tar.gz:

Publisher: release.yml on Ataraxia-ia-labs/WordForge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file wordforge-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: wordforge-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 170.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for wordforge-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a35de483f27e31a39189122ec8ec3aded709f1897c9e269ed9059682ceca4a68
MD5 9e7646d1268935fb1080fb81b65ac811
BLAKE2b-256 0b85688ef470229c72e403076408966349488e79540e70379624909636626258

See more details on using hashes here.

Provenance

The following attestation bundles were made for wordforge-0.4.0-py3-none-any.whl:

Publisher: release.yml on Ataraxia-ia-labs/WordForge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page