Our next complete program is an implementation of the Markov chain algorithm, described by Kernighan & Pike in their book The Practice of Programming (Addison-Wesley, 1999).
The program generates pseudo-random text based on what words can follow a sequence of n previous words in a base text. For this implementation, we will assume that n is two.
The first part of the program reads the base text and builds a table that, for each prefix of two words, gives a list of the words that follow that prefix in the text. After building the table, the program uses it to generate random text, wherein each word follows two previous words with the same probability as in the base text. As a result, we have text that is very, but not quite, random. For instance, when applied to this book, the output of the program has pieces like this: “Constructors can also traverse a table constructor, then the parentheses in the following line does the whole file in a field n to store the contents of each function, but to show its only argument. If you want to find the maximum element in an array can return both the maximum value and continues showing the prompt and running the code. The following words are reserved and cannot be used to convert between degrees and radians.”
To use a two-word prefix as a key in tables, we will represent it by the two words concatenated with a space in between:
function prefix (w1, w2)
return w1 .. " " .. w2
end
We use the string NOWORD (a newline)
to initialize the prefix words and
to mark the end of the text.
For instance, for the text
"the more we try the more we do"
the table of following words would be like this:
{ ["\n \n"] = {"the"},
["\n the"] = {"more"},
["the more"] = {"we", "we"},
["more we"] = {"try", "do"},
["we try"] = {"the"},
["try the"] = {"more"},
["we do"] = {"\n"},
}
The program keeps its table in the variable statetab.
To insert a new word in a list in this table,
we use the following function:
function insert (prefix, value)
local list = statetab[prefix]
if list == nil then
statetab[prefix] = {value}
else
list[#list + 1] = value
end
end
It first checks whether that prefix already has a list; if not, it creates a new one with the new value. Otherwise, it inserts the new value at the end of the existing list.
To build the statetab table, we keep two variables,
w1 and w2, with the last two words read.
We read the words using the iterator allwords,
from the section called “Iterators and Closures”,
but we adapted the definition of “word” to include
optional punctuation marks,
such as commas and periods (see Figure 19.1, “Auxiliary definitions for the Markov program”).
For each new word read,
we add it to the list associated with w1–w2
and then update w1 and w2.
After building the table,
the program starts to generate a text with MAXGEN words.
First, it re-initializes variables w1 and w2.
Then, for each prefix,
it chooses a next word randomly from the list of valid next words,
prints this word, and updates w1 and w2.
Figure 19.1, “Auxiliary definitions for the Markov program” and Figure 19.2, “The Markov program” show the complete program.
Figure 19.1. Auxiliary definitions for the Markov program
function allwords ()
local line = io.read() -- current line
local pos = 1 -- current position in the line
return function () -- iterator function
while line do -- repeat while there are lines
local w, e = string.match(line, "(%w+[,;.:]?)()", pos)
if w then -- found a word?
pos = e -- update next position
return w -- return the word
else
line = io.read() -- word not found; try next line
pos = 1 -- restart from first position
end
end
return nil -- no more lines: end of traversal
end
end
function prefix (w1, w2)
return w1 .. " " .. w2
end
local statetab = {}
function insert (prefix, value)
local list = statetab[prefix]
if list == nil then
statetab[prefix] = {value}
else
list[#list + 1] = value
end
end
Figure 19.2. The Markov program
local MAXGEN = 200
local NOWORD = "\n"
-- build table
local w1, w2 = NOWORD, NOWORD
for nextword in allwords() do
insert(prefix(w1, w2), nextword)
w1 = w2; w2 = nextword;
end
insert(prefix(w1, w2), NOWORD)
-- generate text
w1 = NOWORD; w2 = NOWORD -- reinitialize
for i = 1, MAXGEN do
local list = statetab[prefix(w1, w2)]
-- choose a random item from list
local r = math.random(#list)
local nextword = list[r]
if nextword == NOWORD then return end
io.write(nextword, " ")
w1 = w2; w2 = nextword
end
Personal copy of Eric Taylor <jdslkgjf.iapgjflksfg@yandex.com>