Information Theory and Word Puzzles

Why guessing the last five words of a quote is secretly a math problem.

When you open Quotedle in the morning, you are staring at a small version of one of the oldest problems in mathematics: given limited evidence, how should I update my belief about what is going on? The problem has a proper name. It is called information theory, and it was invented by Claude Shannon at Bell Labs in the late 1940s while he was trying to figure out how to send phone calls through noisy wires. It turns out the same math that decides how many bits a phone call needs also decides what a good first guess in Wordle is worth. Here is why.

The core idea: information is surprise

Shannon’s breakthrough was the realization that “information” has a precise, measurable meaning. Information is the reduction in your uncertainty. If somebody tells you the sun rose this morning, you learn nothing, because you were already certain. If somebody tells you the first word of today’s Quotedle answer is always, you have learned a lot, because that word used to be one of many candidates and now it is pinned down.

The unit of information is the bit. One bit is exactly the amount of information you need to answer a single yes/no question optimally. Two bits tell you which of four equally likely options is correct. Ten bits pick one out of 1,024. The number of bits to pick one item from a set of N equally likely items is log₂(N). In a game with around 2,000 candidate quotes, one bit of information should, on average, cut the space in half — from 2,000 to 1,000 to 500 to 250 and so on. After roughly 11 bits, you are down to one answer.

Entropy: how hard is the game, before you start?

Entropy is the average number of bits of information the answer contains. If every quote in the set is equally likely to be today’s puzzle, entropy is exactly log₂(N) where N is the size of the set. In practice some quotes are more likely than others (they share common endings like “of the world”) and that reduces the entropy slightly, because your brain can guess those endings before any evidence comes in.

Why does this matter at the table? Because the entropy of the game is the minimum number of bits a perfect player would need to win. If the puzzle has around 11 bits of entropy and you are allowed 6 guesses, each guess must contribute roughly 2 bits of information on average. That is the budget of a Quotedle player. Miss the budget consistently and you will lose the puzzle. Meet it and you will finish in four guesses most days.

What a “good” guess really does

A guess is not just an attempt at the answer. It is also a question. When you submit a guess, the feedback you get (green, yellow, gray) is the answer to the question “Which of the remaining quotes are still possible?” Every green and yellow and gray tile eliminates some subset of candidates. The best guess is the one that, no matter what the feedback turns out to be, shrinks the remaining candidate pool the most.

Formally, this is expected information gain. For each possible feedback pattern F, compute the probability p(F) of seeing that pattern and the entropy remaining after you see it. Subtract that conditional entropy from the current entropy. The best guess maximizes the difference. This is exactly the same math used by decision trees in machine learning and by doctors ordering diagnostic tests.

The best first guess is never the one most likely to be the answer. It is the one that gives you the most useful information regardless of whether it is correct.

An example with small numbers

Suppose the puzzle’s hidden five-word ending is one of four equally likely options:

  1. “the end of all things”
  2. “the start of all things”
  3. “a love of small things”
  4. “a fear of lost things”

Entropy here is log₂(4) = 2 bits. Now consider two possible guesses:

Guess B is not a likely winner, but its expected information gain is higher. Over many puzzles, a player using Guess-B-style reasoning will beat a player using Guess-A-style reasoning even though Guess A “could” win on guess one. This is why the best Wordle bots open with CRANE or SALET every time: those words maximize expected information gain across the answer set, not the chance of being correct.

Applying this to Quotedle

Quotedle is a word-slot puzzle. The hidden “symbol” at each of the five positions is a word drawn from the bank. The bank usually contains eight to twelve words, of which five are the answer and the rest are distractors. That means the prior entropy of a single puzzle is a bit more than log₂(8!/3!) ≈ 13 bits if all orderings were equally plausible. Grammar drops that number considerably — the real-world entropy is closer to 6 to 8 bits once a native speaker has rejected nonsense orderings.

The practical consequence: on most puzzles, a focused first guess can carry 3 to 4 bits. That leaves you with roughly 3 more bits to burn over five more guesses, which is why a careful opener usually gives you enough runway to finish in three or four tries. A scattered first guess that has no green or yellow tiles feels bad not only because it is useless, but because it mathematically is — it contributed zero bits of information and used one of your six attempts.

Why the game feels satisfying

There is a psychological reason this mathematics sits so well in the brain. Humans are hungry for uncertainty reduction. Puzzles that shrink the space of possibilities at a steady rate give the mind something like the dopamine hit of a good conversation: each new piece of information fits somewhere, and the picture becomes clearer. Puzzles that collapse too quickly (“got it first try”) feel thin. Puzzles that never collapse (“no green or yellow after guess four”) feel unfair. The game-design goal is to put the average solver in a place where each guess produces about the same amount of information. That is information-theoretic fairness.

Reading on, if you like this stuff

The beautiful thing about Quotedle is that you do not need any of this math to enjoy the game. But if you ever wondered why one of your guesses felt brilliant and another felt wasted, the answer probably lives in these ideas. A good guess is just a question with a high expected information gain. A great guess is one where you were also lucky.