Pythonic Explorations of Wordle and 5 Letter English Words

MANIFESTO font created by Tomaz Leskovec, artwork by Kevin Lease in homage to the word puzzle discussed in the blog.


Some of my friends enjoy doing Wordle (https://www.nytimes.com/games/wordle/index.html), a game created by Josh Wardle in which you try to identify a 5 letter word.  This made me think about the lexicon of 5 letter English words and Wordle strategies.  For example, which words would in general be better initial guesses?  Given a word attempt and resulting feedback information, which would be the next best word attempt?  I did a Google search and other people have already published Wordle solvers.  Therefore, this is going to be an exercise for my own edification, with a goal of improve my use of Python and the enjoyment of working through this thought experiment.

With 26 letters and 5 positions, there are theoretically 26 to the 5th power number 'words', or 11,881,376 five letter combinations.  Of course, many of these are not English words, so getting a list of English words will significantly reduce the possible solution space.

I obtained a list of English words from https://github.com/dwyl/english-words and used the file words_alpha.txt, which contains 370,103 words.

My first step was to winnow the full list of words down to a list of 5 letter words.

with open("words_alpha.txt") as file:

        allwords = file.read().splitlines()

for i in allwords:

        if len(i) == 5:

                print(i)


This gave me a list of five letter words that contains 15918 entries which I saved as fiveletterwords, which represents 4.3% of the total list of 370103 words.  There is a huge difference in size between the nearly 12 million possible 5 letter combinations and much smaller actual five letter English word list.


The next question I asked is how many of these 5 letter words contain a given letter of the alphabet and how often is a letter found altogether in the 5 letter word set?


import string

fiveletterwords = []

with open("words_alpha.txt") as file:

        allwords = file.read().splitlines()


alphabet_string = string.ascii_lowercase

alphabet_list = list(alphabet_string)



def testletter(letter):

        letter_total_freq = 0

        letter_uniq_word_freq = 0

        for i in fiveletterwords:

                x = i.count(letter)

                letter_total_freq = letter_total_freq + x

                if x > 0:

                        letter_uniq_word_freq = letter_uniq_word_freq + 1

        print(letter + " " + str(letter_uniq_word_freq) + " " + str(letter_total_freq))


for i in allwords:

        if len(i) == 5:

                fiveletterwords.append(i)


for letter in alphabet_list:

        testletter(letter)


Output (letter, number of 5 letter words in which the letter can be found, number of times letter found in 5 word set)


a 7247 8392

b 1936 2089

c 2588 2744

d 2639 2811

e 6728 7800

f 1115 1238

g 1867 1971

h 2223 2284

i 4767 5067

j 372 376

k 1663 1743

l 3923 4246

m 2361 2494

n 3773 4043

o 4613 5219

p 2148 2299

q 139 139

r 4864 5143

s 5871 6537

t 3866 4189

u 3241 3361

v 853 878

w 1160 1171

x 357 361

y 2476 2521

z 435 474


Output list with just vowels:


a 7247 8392

e 6728 7800

i 4767 5067

o 4613 5219

u 3241 3361

y 2476 2521


Top 10 consonants from above list in descending order:


s 5871 6537

r 4864 5143

t 3866 4189

l 3923 4246

n 3773 4043

m 2361 2494

d 2639 2811

c 2588 2744

h 2223 2284

p 2148 2299


My father suggested to me that a word guess need not be an actual word, for example, someone could guess all-vowel 5 letter combination AEIOU, which would represent most of the highest frequency letters.

I tried this strategy and found that Wordle rejected AEIOU with "not in word list."

So a Wordle entry has to be an actual word.


One possibility for the best initial Wordle choice would be a word that will have the greatest chance of containing a letter or letters that will be found at any position in the Wordle solution.

Or to put it another way, to take the letters from each English 5 letter word, and ask how many other unique 5 letter English words can be found with at least one letter match somewhere in the word (position of match ignored). 


import string

fiveletterwords = []

with open("words_alpha.txt") as file:

        allwords = file.read().splitlines()


alphabet_string = string.ascii_lowercase

alphabet_list = list(alphabet_string)


def testword(word):

        counter = 0

        allmatches = []

        unique_list = []

        list_of_letters = list(word)

        for j in list_of_letters:

                for k in fiveletterwords:

                        if j in k:

                                allmatches.append(k)

        for x in allmatches:

                if x not in unique_list:

                        unique_list.append(x)

        for x in unique_list:

                counter = counter + 1

        print(word + " " + str(counter))


for i in allwords:

        if len(i) == 5:

                fiveletterwords.append(i)


for i in fiveletterwords:

        testword(i)



strategy1_all.txt 



Top 25 initial word choices predicted by the first method with their scores are below. Did you know the meaning of kioea?  Me neither.  Apparently it was a Hawaiian bird that is now extinct.  Interesting but not useful because it is not a word that Wordle will allow.  Nor will it allow aoife, aueto.  I get down to AEONS on my list before I find a work it will take.  I think wordle solution set lexicon must be somewhat like scrabble lexicon.  Stoae is the next word that wordle will take from the list (the plural of stoa, a freestanding colonnade or covered walkway; I looked it up).



kioea 15239

aoife 15227

aueto 15206

aeons 15151

aotes 15130

stoae 15130

arose 15129

oreas 15129

seora 15129

aesir 15123

aries 15123

arise 15123

raise 15123

serai 15123

aloes 15119

alose 15119

osela 15119

solea 15119

ousia 15104

aurei 15055

uraei 15055

hosea 15043

oshea 15043

aisle 15018

elias 15018



A second approach to help select the best initial Wordle guess was to take each five letter word, look at the 5 letters present in that word and see how many other 5 letter words have a particular letter at the exact same position with a point given for each match.  The higher the score for a word, it means the greater number of occurrences that an exact match of a letter in the word to a letter at the same position in the list of 5 letter words.



import string

fiveletterwords = []

with open("words_alpha.txt") as file:

        allwords = file.read().splitlines()


alphabet_string = string.ascii_lowercase

alphabet_list = list(alphabet_string)


def testword(word):

        wordscore = 0

        counter = 0

        allmatches = []

        unique_list = []

        list_of_letters = list(word)

        for index1, value1  in enumerate(list_of_letters):

                for k in fiveletterwords:

                        secondwordlistofletters = list(k)

                        for index2, value2 in enumerate(secondwordlistofletters):

                                if index1 == index2:

                                        if value1 == value2:

                                                wordscore = wordscore + 1

        print(word + " " + str(wordscore))


for i in allwords:

        if len(i) == 5:

                fiveletterwords.append(i)


for i in fiveletterwords:

        testword(i)


To sort the output:

import string

five_word_dict = {}

with open("wordscore_list2.txt") as file:

        fivescores = file.read().splitlines()


for i in fivescores:

        info = []

        info = i.split(' ', 1 )

        five_word_dict[info[0]] = info[1]



sort_orders = sorted(five_word_dict.items(), key=lambda x: int(x[1]), reverse=True)

for i in sort_orders:

        print(i[0], i[1])


strategy2_all.txt


Top 25 initial word choices by the second method with their scores:

sanes 11579

sales 11401

sores 11295

cares 11268

bares 11213

sates 11124

tares 11053

pares 11016

sones 10989

seres 10984

canes 10962

mares 10921

banes 10907

dares 10873

sades 10855

soles 10811

sages 10802

sabes 10787

fares 10756

lares 10751

bales 10729

panes 10710

saris 10697

sires 10683

cores 10678



Once an initial guess is made, the results can be used to filter the 5 letter English word set to help make the next guess.

There are four filters to apply to our list of words based on the results of an attempt:

1. Letters present at some position in the word

2. Letters absent from any position in the word

3. Letter present but excluded from a position in the word

4. Letter present and must be at a position in the word


I have created a python script to filter the word list by these rules:


kl_python_word_filter.py


#kl_python_word_filter.py by Kevin Lease 2022

#licensed under a Creative Commons

# Attribution-NonCommercial-ShareAlike 4.0 International License

import string

fiveletterwords = []

keepwords = []

keepwords2 = []

wordchars = []

keep_letters = ['o','u', 't']

#keep_letters = ['a', 'r', 'l']

remove_letters = ['p', 'i', 's','y','h']

#remove_letters = ['e', 'o', 'n', 's', 'g', 'v', 'y', 'i', 'b']

# positions are indexed from 0

remove_at_position = ['o2','u3','o1','u2']

#remove_at_position = ['a0', 'r1', 'a2', 'l0']

#keep_at_position = []

keep_at_position = []

with open("words_alpha.txt") as file:

        allwords = file.read().splitlines()

for i in allwords:

        if len(i) == 5:

                fiveletterwords.append(i)

print("five letter words:" + str(len(fiveletterwords)))


def keepword(word, keep_letters):

global keepwords

flag = 1

for j in keep_letters:

if j not in word:

flag = 0

if flag == 1:

keepwords.append(word)

def remove_letter_func(letter):

global keepwords

global keepwords2

for word in keepwords:

if letter not in word:

if word not in keepwords2:

keepwords2.append(word)

keepwords.clear()

keepwords = [i for i in keepwords2]

keepwords2.clear()



def remove_letter_at_position(letter, index):

global keepwords

global keepwords2

global wordchars

keepwords2.clear()

for word in keepwords:

wordchars.clear()

wordchars = list(word)

if wordchars[int(index)] == letter:

pass

                        #print(word + " " + wordchars[value] + " " + key)

                        #keepwords.remove(word)

if wordchars[int(index)] != letter:

if word not in keepwords2:

keepwords2.append(word)

keepwords.clear()

keepwords = [i for i in keepwords2]


def keep_letter_at_position(letter, index):

global keepwords

global keepwords2

global wordchars

keepwords2.clear()

for word in keepwords:

wordchars.clear()

wordchars = list(word)

if wordchars[int(index)] == letter:

keepwords2.append(word)

keepwords.clear()

keepwords = [i for i in keepwords2]



if keep_letters:  #check to make sure list not empty

for i in fiveletterwords:

keepword(i, keep_letters)


print("words containing keep letters:" + str(len(keepwords)))


if remove_letters:  #check to make sure list not empty

for letter in remove_letters:

remove_letter_func(letter)


print("words left after removing absent letters:" + str(len(keepwords)))


if remove_at_position: #check to make sure list not empty

for pair in remove_at_position:

pairlist = list(pair)

letter = pairlist[0]

index = pairlist[1]

#print ("letter:" + letter + " " + "index:" + str(index))

remove_letter_at_position(letter, index)


print("words after removing letters by position:" + str(len(keepwords)))


if keep_at_position:  #check to make sure list not empty

for pair in keep_at_position:

pairlist = list(pair)

letter = pairlist[0]

index = pairlist[1]

keep_letter_at_position(letter, index)

print("words after keeping letters by position:" + str(len(keepwords)))


for i in keepwords:

print(i)




Looking over the list of possible word solutions at earlier stages, there are many words with which I am unfamiliar, which reminds me of how much I don't know and to be humble.

However, if I look at historical answers for Wordle this month, all of the word solutions are words in my personal lexicon.  So, another filter that could be created would be based on this principle, a 'familiarity index.'  The solution that springs to my mind, I would take one or more open source books from Project Gutenberg, parse the text into words and keep the 5 letter words as a list.  Theoretically, this could be used to narrow the list of 5 letter words to one more likely to be in Wordle.      



Disclaimer #1:

WORDLE is copyright 2022 to the New York Times.  This blog is unaffiliated with both Wordle or the New York Times.  This content was done for my own scholarship and not for any profit.

Disclaimer #2:

The author does not make any warranties about the completeness, reliability and accuracy of this information. Any action you take upon the information on this site is strictly at your own risk, and the author will not be held liable for any losses and damages in connection with the use of this information.


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Comments