Procedural Names

I’m not quite ready to start talking about my current game project, but one thing I can say is that it requires the ability to generate names procedurally. Of the several methods I’ve researched, Markov Chains seem to be the most promising. Among other things they can be used to generate random music, text, and individual words. We’ll be concentrating on individual words.

A Markov Chain describes a state machine that can transition to the next state based only on the current state and probabilities. Stated simply, this method works by analyzing input data to build up a table of states and some probabilities. Given a current state, the next state is determined based on the probability of that state as found in the sample data.

Drilling in a little deeper, to use Markov Chains for generating random names we can use a large set of names as input data. Our system will analyze these names to determine the probability of a letter following another letter. We can then start with a random letter and pick the next letter based on those probabilities, and the next, and so on. Using a single letter like this is said to be first-order. We can also use two letters, or three or more for second-order and third-order systems. A higher order will produce random output that more closely resembles the input.

Let’s build a second-order probability table using the names “Domitrovich” and “Dombrowsky” as input.

Starting with domitrovich we grab the first two letters do and use that for a state. The next letter is m.  So far we know that when the state is do, there is a 100% chance that the next letter is an m.

The next two letters are om, and the letter following is i. So now we know that when the state is om there is a 100% chance that the next letter is an i.

Following this same process we end up with this table:

State Following Letter Probability
do m 1.0
om i 0.5
om b 0.5
mi t 1.0
it r 1.0
tr o 1.0
ro v 0.5
ro w 0.5
ov i 1.0
vi c 1.0
ic h 1.0
ch end_of_word 1.0
mb r 1.0
br o 1.0
ow s 1.0
ws k 1.0
sk y 1.0
ky end_of_word 1.0

 

Note that the om and ro states each have two instances with different following letters.

We can now use this table to build a random name. Our word starts with a random 2-letter state. We use that as a key into this table to grab a random following letter based on the probabilities. We add that letter to the end of our word and use the last 2 letters as our new state and repeat the process until we reach the end of the world. For example, starting with tr:

State Following Letters Word
tr o tro
ro v, w trow
ow s trows
ws k trowsk
sk y trowsky
ky end_of_word trowsky

 

With two names as input we won’t get much variety, but with a hundred or more we get some very good random output that strongly resembles the input.  Here is some example output using a list of 400 Ukrainian surnames as input.

Anovsky
Ragoniak
Stepkap
Stutkovich
Tulinovich
Vichuna
Buchalk
Rovichal
Pandamar
Popolek

 

We can expand this basic algorithm to improve the output quality:

  • Expand the probability table by including word beginnings and make sure the output words always start with one of the word beginnings from the input data
  • Identify consonant and vowel patterns in the input data and ensure the generated word follows one of those patterns.
  • Allow for minimum and maximum word lengths.

 

I’m not going to go through all of the code for implementing this, but it’s all included in the downloadable sample project.