The Stats behind Wordle

The puzzle game Wordle have been trending for a while and hit headlines more recently when purchased by New York Times for over 1 million USD. In Wordle, players need to identify a randomly generated 5-letter word in 6 attempts or less. Players learn hints about the letters and their placements after every attempt: green block means we have the right letter in right place; orange block means the right letter but in wrong place; black block means we get the letter wrong.

Simple game as it is, Wordle is well designed in so many ways. I’m not intended to delve into these aspects today; instead, I’m more interested in the statistical aspect of the game.

Letter frequency

Playing Wordle, or any other word puzzles, requires knowledge about letter frequency. From the corpora of 2315 possible solutions of Wordle as well as the 1995 edition of Concise Oxford Dictionary, letter e appears 36 and 57 times more frequent than letter q respectively. The plot below shows the frequency of each letter to feature in the solution.

In Wordle, the difference in frequency could transform into different probability distribution of hints given. Imagine putting letter z at first (with zebra for example), the block has 0.1% probability to turn green, 1.4% orange (right letter in wrong place) and 98.5% black. By putting letter r at second (with trace for example), the block has 11.5% probability to turn green, 24.6% orange and 63.9% black. Given the distributions, it’s not hard to understand that selecting letters of higher frequency provides more help to winning the game.

Optimal Way Out

Since the size of possible solutions (2315 words) is rather small, a definitive, decision-tree like optimal solution to the game has been found by computer simulation with search heuristics. This strategy finds strong sweepers that divides possible solutions into many subgroups of similar size. In average, the strategy finds the answer with 3.42 iterations.

However, casual players are very unlikely to perform such decision process. For example, after a first attempt with CRATE (figure below), it’s still hard for human to determine that COMET, COVET, CHEST and CLEFT are the only remaining options, not to mention that using ABMHO as the identifier is simply out of this world.

Strategy For Casuals

Casual players depend heavily on accumulated information to make guesses. The sheer number of successful blocks (green and orange) could be the key to a successful attempt. In this regard, IRATE comes atop with an expectation of 1.78 successful blocks.

A good strategy now is to keep winning out of mind in the first 2 or 3 attempts, and try to hit as many successful blocks as possible with sweepers. When planning the first 2 or 3 attempts collectively, we can use words combinations that cover the top 10 and 15 letters in terms of frequency, which ensure maximum value of successful blocks expected. ROUTE plus SLAIN is the top combo, while CURLY, POINT and SHADE make the perfect trio. They return 0.98 and 1.51 green blocks respectively.

References