I’m not quite ready to start talking about my current game project, but one thing I can say is that it requires the ability to generate names procedurally. Of the several methods I’ve researched, Markov Chains seem to be the most promising. Among other things they can be used to generate random music, text, and individual words. We’ll be concentrating on individual words.

A Markov Chain describes a state machine that can transition to the next state based only on the current state and probabilities. Stated simply, this method works by analyzing input data to build up a table of states and some probabilities. Given a current state, the next state is determined based on the probability of that state as found in the sample data.

Drilling in a little deeper, to use Markov Chains for generating random names we can use a large set of names as input data. Our system will analyze these names to determine the probability of a letter following another letter. We can then start with a random letter and pick the next letter based on those probabilities, and the next, and so on. Using a single letter like this is said to be *first-order*. We can also use two letters, or three or more for *second-order* and *third-order* systems. A higher order will produce random output that more closely resembles the input.

Let’s build a second-order probability table using the names “Domitrovich” and “Dombrowsky” as input.

Starting with *domitrovich* we grab the first two letters *do* and use that for a state. The next letter is *m*. So far we know that when the state is *do*, there is a 100% chance that the next letter is an *m*.

The next two letters are *om*, and the letter following is *i*. So now we know that when the state is *om* there is a 100% chance that the next letter is an *i*.

Following this same process we end up with this table:

State |
Following Letter |
Probability |

do | m | 1.0 |

om | i | 0.5 |

om | b | 0.5 |

mi | t | 1.0 |

it | r | 1.0 |

tr | o | 1.0 |

ro | v | 0.5 |

ro | w | 0.5 |

ov | i | 1.0 |

vi | c | 1.0 |

ic | h | 1.0 |

ch | end_of_word | 1.0 |

mb | r | 1.0 |

br | o | 1.0 |

ow | s | 1.0 |

ws | k | 1.0 |

sk | y | 1.0 |

ky | end_of_word | 1.0 |

Note that the *om* and *ro* states each have two instances with different following letters.

We can now use this table to build a random name. Our word starts with a random 2-letter state. We use that as a key into this table to grab a random following letter based on the probabilities. We add that letter to the end of our word and use the last 2 letters as our new state and repeat the process until we reach the end of the world. For example, starting with *tr*:

State |
Following Letters |
Word |

tr | o | tro |

ro | v, w | trow |

ow | s | trows |

ws | k | trowsk |

sk | y | trowsky |

ky | end_of_word | trowsky |

With two names as input we won’t get much variety, but with a hundred or more we get some very good random output that strongly resembles the input. Here is some example output using a list of 400 Ukrainian surnames as input.

Anovsky Ragoniak Stepkap Stutkovich Tulinovich Vichuna Buchalk Rovichal Pandamar Popolek

We can expand this basic algorithm to improve the output quality:

- Expand the probability table by including word beginnings and make sure the output words always start with one of the word beginnings from the input data
- Identify consonant and vowel patterns in the input data and ensure the generated word follows one of those patterns.
- Allow for minimum and maximum word lengths.

I’m not going to go through all of the code for implementing this, but it’s all included in the downloadable sample project.