7.4 The Simple Soundslike

The simple soundslike goes something like this:

sl0[0] = lookup0(word[0])
for (i = 1; i < size; i++)
  sl0[i] = lookup(word[i]);
s = 0;
for (i = 0; i < size; i++)
  sl.append(al0[i]) unless sl0[i] == 0 || sl0[i] == sl0[i-1];

Basically each character can be converted to another character or deleted. A separate lookup table is used for the first character. If the same soundslike letter is repeated, the duplicate is removed.

By default all accents are removed, and all vowels are deleted unless they appear at the start of the word in which case they are converted to a ‘*’. The exact behavior can be customized via the character data file.

The simplified soundslike has the advantage that it is very fast to compute and thus does not need to be stored with a word. Also, when affix compression is used and the ‘partially-expand’ is given the results will be identical to the results when affix compression is not used.

Of course it is not nearly as powerful as the phonetic soundslike.

