Skip to main content

A Wordle solver with pluggable strategies and a strategy-comparison benchmark suite.

Project description

CI Python 3.10+ MIT License

wordlesmith

A Wordle solver with pluggable strategies and a benchmark suite for comparing them.

Considers every valid word a possible answer, so it never dead-ends on a real puzzle (entropy averages 4.52 guesses over all 14,855 valid words, and 3.60 on the classic 2,315-answer set). The core is pure standard library.

wordlesmith playing along with a Wordle

Contents

What it does

wordlesmith is a command-line and library Wordle solver. It ships:

  • A Wordle scoring engine that handles duplicate letters correctly, which is where most solvers have subtle bugs.
  • Five strategies behind one interface: positional frequency, entropy, expected remaining size, minimax, and a random control.
  • A benchmark framework that plays every valid word and reports the full guess distribution.
  • The full 14,855-word valid-guess list (the default answer pool, so it never dead-ends on a real puzzle) and the original 2,315-word answer set, packaged with a precomputed opening-guess table so the first move is instant.

The core has no third-party dependencies. Plotting is the only extra.

Install

# From GitHub
pip install "git+https://github.com/adityakmehrotra/wordlesmith"

# For development (tests, lint, plots)
git clone https://github.com/adityakmehrotra/wordlesmith
cd wordlesmith
pip install -e ".[dev,bench]"

Requires Python 3.10+.

Quickstart

Command line

Auto-solve a known word:

wordlesmith solve maven, then solve crane --curated

(maven is a real NYT answer that isn't in the original 2,315-word list, so a solver built only on that list would never find it. The default pool is every valid word, so this just works.)

Play along with a real puzzle: it suggests a guess, you type the colors back (g=green, y=yellow, x=gray):

$ wordlesmith play --strategy entropy
Turn 1 suggestion: TARES   (14855 candidates)
Enter feedback: xgxgx
Turn 2 suggestion: LADEN   (150 candidates)
Enter feedback: ...

Benchmark one strategy, or compare several:

$ wordlesmith benchmark --strategy entropy --sample 300
$ wordlesmith compare --strategies frequency,entropy,minimax --markdown
$ wordlesmith compare --curated --markdown          # the classic 2,315-answer set

Run wordlesmith --help (or wordlesmith <command> --help) for all options, including --curated, --guess-pool all, --jobs for parallel benchmarks, and --answers/--allowed for custom word lists.

Python API

from wordlesmith import get_strategy, simulate, feedback, pattern_to_string

# Score a guess against a target (base-3 pattern; g/y/x string for humans)
print(pattern_to_string(feedback("speed", "abide")))  # -> xxyxy

# Auto-play a word
result = simulate("maven", get_strategy("entropy"))
print(result.turns, result.guesses)  # -> 3 ['tares', 'laden', 'maven']

Benchmark

Lower average is better; max is the worst game; fail% is games not solved within six guesses.

Primary: every valid word (the default)

Each strategy plays all 14,855 valid words, guessing from the words still consistent with the feedback. This is how the solver actually runs, so it never dead-ends on a real puzzle:

strategy pool avg max fail%
random answers 5.061 >6 16.68
frequency answers 4.922 >6 14.57
minimax answers 4.658 >6 11.29
expected-size answers 4.585 >6 10.57
entropy answers 4.523 >6 9.47

The averages are higher and the failure rate is non-trivial (about 9% even for entropy) because the full valid list is packed with near-identical clusters (match/batch/catch/hatch/..., the -ound and -ight families, plus many obscure words) that simply cannot be separated in six guesses. Those hard words are almost never real NYT answers, so for actual daily play the curated number below is the realistic one; this table is the pessimistic "solve literally any valid word" figure.

Guess distribution by strategy

Secondary: the classic 2,315-answer set (--curated)

Restricted to the original Wordle solution set, the problem is easier and the numbers are comparable to published solvers. The all pool (guessing any word for information) gets close to the known optimum of about 3.421:

strategy pool avg max fail%
random answers 4.039 >6 0.82
frequency answers 3.640 >6 0.60
expected-size answers 3.623 >6 0.60
minimax answers 3.677 >6 0.65
entropy answers 3.598 >6 0.48
entropy all 3.465 6 0.00
expected-size all 3.481 5 0.00
minimax all 3.573 6 0.00

A concrete example of what the smart strategies buy you: solving mound on the curated set, the frequency baseline burns turns cycling through lookalikes (slate, crony, bound, found, hound, mound) while entropy picks a splitting guess and finishes in three (raise, mulch, mound).

Methodology: a game is a failure if unsolved in 6 guesses (counted as 7 in the mean). Deterministic strategies are reproducible; random uses a fixed seed. Full results and per-word data are in benchmarks/results/official/; regenerate the primary with python scripts/run_official_benchmark.py. The primary answers-pool run takes about 10 minutes per strategy on 9 cores; the curated all-pool run scores every valid word each turn and takes far longer, which is why it stays on the smaller curated set. Use --sample N for a quick estimate.

How it works

Scoring: Wordle feedback is computed in two passes. Greens are assigned first and each consumes its letter in the target; yellows are then assigned left to right, each consuming a remaining occurrence. A guess letter with no occurrence left is gray. This is why the second E in SPEED is gray against ABIDE, which has only one E.

Filtering: after each guess the solver keeps a word w only if feedback(guess, w) equals the pattern actually observed. This single rule handles every duplicate-letter case correctly, so there is no separate (and bug-prone) tracking of which letters are "in" or "out".

Word lists: by default every valid Wordle word is treated as a possible answer. The original Wordle solution set was only 2,315 words, but the NYT has revised it over time, so a solver built on that list can dead-end on a legitimate answer it never considered (maven, for instance). Using the full valid list avoids that, at the cost of a somewhat higher average since there are more words to tell apart. Pass --curated to fall back to the original 2,315-answer set (faster, and the numbers become comparable to published solvers).

Strategies

name idea good for
frequency Sum of per-position letter frequencies among candidates (the original baseline). A strong, cheap heuristic.
entropy Maximize expected information (Shannon entropy of the feedback-bucket distribution). Best average guess count.
expected-size Minimize the expected number of remaining candidates. Simple, nearly as strong as entropy.
minimax Minimize the largest feedback bucket (worst case). Smallest worst case.
random Guess a random consistent word. A control / lower bound.

The entropy, expected-size, and minimax strategies accept a --guess-pool of answers (guess from remaining candidates) or all (guess from the full allowed list).

See docs/strategies.md for an in-depth explanation of each strategy: the scoring formulas, the bucket-splitting idea the information-theoretic strategies share (with a worked example), the guess-pool trade-off, and how to add your own strategy.

Limitations

  • Pure Python is slow for the all guess pool. Scoring every valid word each turn takes minutes per benchmark, which is why the committed all-pool numbers stay on the curated set. For a single interactive solve/play it's fine (the opening is precomputed).
  • The word list is a snapshot. valid_words.txt is the NYT valid-guess list as of mid-2025. If the NYT adds words later, refresh it and regenerate the opening table.
  • Six-guess failures are expected. Over the full valid list even entropy fails about 9% of games, because clusters like match/batch/catch/hatch or the -ound/-ight families can't be separated in six turns. Those words are rarely real answers, so --curated is the realistic daily-play figure.
  • The strategies are greedy. They optimize the current guess, not the whole game tree, so even the best is a step behind the known optimal decision tree (about 3.421 on the curated set).
  • English five-letter Wordle only. No hard mode and no other word lengths (the engine assumes five letters), though --answers/--allowed accept custom five-letter word lists.

Development

pip install -e ".[dev,bench]"
pytest --cov=wordlesmith      # tests + coverage
ruff check . && ruff format --check .
mypy src/
python -m build && twine check dist/*

Contributions welcome. A natural extension is adding a new strategy: implement Strategy, register it, and it shows up in compare automatically. Please open an issue or PR.

License & contact

Distributed under the MIT License. See LICENSE.txt.

Aditya Mehrotra. Reach me at adi1.mehrotra@gmail.com or on LinkedIn.

(back to top)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wordlesmith-0.1.0.tar.gz (416.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wordlesmith-0.1.0-py3-none-any.whl (67.2 kB view details)

Uploaded Python 3

File details

Details for the file wordlesmith-0.1.0.tar.gz.

File metadata

  • Download URL: wordlesmith-0.1.0.tar.gz
  • Upload date:
  • Size: 416.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for wordlesmith-0.1.0.tar.gz
Algorithm Hash digest
SHA256 eb127111190a07c094f95b0db62ca553b892419d3255a3faaf31009365668253
MD5 d08058d92a495374e9b18e2b87571007
BLAKE2b-256 4df4f62b8c4c12cc27b31c17e947abc0beeb06b5a829332a805df07f279feb4b

See more details on using hashes here.

File details

Details for the file wordlesmith-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: wordlesmith-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 67.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for wordlesmith-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 49d74f5dba91be0c1e440cbcfe4de1df848892cc67193e348bd7e180089bd7d8
MD5 c51692eb2b2455795c9848c5c6ecd191
BLAKE2b-256 8277d0cd34a909a794555f2727411b0a3e8c4ff4ce6f5b63fdbb17942cd2c07b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page