Soundex (Rexx)
From LiteratePrograms
This is an implementation of the SOUNDEX phonetic-matching algorithm. The goal of the algorithm is to encode names with the same pronunciation to the same value, allowing them to be compared without regard to spelling variations. It works best for English surnames — there are other algorithms (e.g., NYSIIS) and variations on SOUNDEX that work better for other names.
While true to the definition of SOUNDEX, this version is not the exact algorithm published so many years ago in Knuth or patented by Russell and Odell. That algorithm was heavily loop-based and didn't perform well in an interpreted language. This version produces the same results much more quickly, relying on some REXX-isms like Space()
and Translate()
.
Algorithm
1. We begin by accepting the first letter of the name as it is. This is the only letter in the input that will appear as a letter in the output.
<<Take first letter>>= Result = Left(Source, 1)
2. Next, we convert each character in the name into a numeral that groups similar-sounding letters together. The groupings are as follows. Note that the original Soundex algorithm doesn't use "0" or "7". We use "0" to mark letters that we will eventually delete but must retain for the moment as separators. Anything which must be deleted but is not a separator is marked with a "7" then immediately removed. We do this using the REXX Translate(Input, ToTable, FromTable, PadChar)
function, which will replace all characters in Input
that are in FromTable
with the corresponding characters in ToTable
, and will replace any characters not in FromTable
with PadChar
.
Numeral | Letters |
---|---|
0 | A, E, I, O, U, Y |
1 | B, F, P, V |
2 | C, G, J, K, Q, S, X, Z |
3 | D, T |
4 | L |
5 | M, N |
6 | R |
7 | H, W |
<<Convert letters to numerals>>= SoundexNum = "01230127022455012623017202" /* ABCDEFGHIJKLMNOPQRSTUVWXYZ */ Source = Translate(Source, SoundexNum, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "7")
Now take out the sevens.
<<Remove all 7s>>= Source = Space(Translate(Source, " ", "7"), 0)
3. We then replace sequences of the same numeral with a single instance followed by the requisite number of zeroes.
<<Make multiple digits single>>= Do I = 1 to Length(Source) Do J = I+1 to Length(Source) While Substr(Source, I, 1) = Substr(Source, J, 1) Source = Left(Source, J-1) || "0" || Substr(Source, J+1) End End
4. Almost done, we now drop the first numeral (i.e., the first letter which we handled at the start) and remove any zeroes in the rest. The Translate(Source, " ", "0")
function will return Source
with any zeroes replaced with spaces, and the Space(..., 0)
function will delete any spaces in that result.
<<Remove zeroes after first position>>= Result = Result || Space(Translate(Substr(Source, 2), " ", "0"), 0)
5. Lastly, we pad the result on the right to six characters in length using zeroes. Unlike previous zeroes, these will survive into the result.
<<Fill on right with zeroes>>= Result = Left(Result, 6, "0")
As a function
With a little code to pull this all together into a nice function, we're done:
<<Soundex Function>>= /* ------------------------------------------------------------------------ */ /* code = Soundex(name) */ /* */ /* Compute and return the SOUNDEX code corresponding to the specified name. */ /* ------------------------------------------------------------------------ */ Soundex: Procedure Source = Arg(1) Take first letter Convert letters to numerals Remove all 7s Make multiple digits single Remove zeroes after first position Fill on right with zeroes Return Result
Main program for testing
The following main program will repeatedly prompt the user to supply a name and compute and display its SOUNDEX equivalent until cancelled by the user.
<<soundex_test.rexx>>= Do Forever Call LineOut , "Enter the name: " Name = LineIn() If Name = "" then Leave Say "SOUNDEX value = " || Soundex(Name) End Exit 0 Soundex Function
Download code |