New York State Identification and Intelligence System (Rexx)
From LiteratePrograms
This is an implementation of the NYSIIS phonetic-matching algorithm. The goal of the algorithm is to encode names with the same pronunciation to the same value, allowing them to be compared without regard to spelling variations. Unlike SOUNDEX, it works well for a variety of European and Hispanic surnames.
The algorithm is taken from Robert L. Taft, "Name Search Techniques", New York State Identification and Intelligence System. It yields an alpha key which is filled or rounded to 10 characters.
Algorithm
1. Inspect the first characters of the name and replace certain sequences. Note that the K
to C
replacement is not performed if the KN
to NN
replacement is.
From | To |
---|---|
MAC | MCC |
KN | NN |
K | C |
PH, PF | FF |
SCH | SSS |
<<Translate first characters of name>>= Select When Left(Source, 3) = "MAC" then Source = "MCC" || Substr(Source, 4) When Left(Source, 3) = "SCH" then Source = "SSS" || Substr(Source, 4) When Left(Source, 2) = "KN" then Source = "NN" || Substr(Source, 3) When Left(Source, 2) = "PH" | , Left(Source, 2) = "PF" then Source = "FF" || Substr(Source, 3) When Left(Source, 1) = "K" then Source = "C" || Substr(Source, 2) Otherwise Nop End
2. Replace certain sequences at the end of the name.
From | To |
---|---|
EE | Y |
IE | Y |
DT, RT, RD, NT, ND | D |
<<Translate last characters of name>>= Ending = Right(Source, 2) If Ending = "EE" | Ending = "IE" then , Source = Left(Source, Length(Source)-2) || "Y" If Ending = "DT" | Ending = "RT" | Ending = "RD" | Ending = "NT" | , Ending = "ND" then , Source = Left(Source, Length(Source)-2) || "D"
3. Begin accumulating the result by accepting the first letter of the name as it is.
<<Accept first letter>>= Result = Left(Source, 1)
4. Translate the remaining letters according to a set of rules, processing one letter at a time:
<<Translate remaining characters by rules>>= Do Cursor = 2 to Length(Source) Apply rules to one letter group If Char <> Right(Result, 1) then , Result = Result || Char End <<Define constants>>= Vowels = "AEIOU" <<Apply rules to one letter group>>= Chars = Substr(Source, Cursor, 3) Char = Left(Chars, 1) Select
a. Replace EV
with AF
and replace all other A, E, I, O, U
with A
.
<<Apply rules to one letter group>>= When Left(Chars, 2) = "EV" then Char = "AF" When Pos(Char, Vowels) > 0 then Char = "A"
b. Replace Q
with G
, Z
with S
, and M
with N
.
<<Apply rules to one letter group>>= When Char = "Q" then Char = "G" When Char = "Z" then Char = "S" When Char = "M" then Char = "N"
c. Replace KN
with N
and replace all other K
with C
.
<<Apply rules to one letter group>>= When Left(Chars, 2) = "KN" then Char = "N" When Char = "K" then Char = "C"
d. Replace SCH
with SSS
and PH
with FF
.
<<Apply rules to one letter group>>= When Left(Chars, 3) = "SCH" then Char = "SSS" When Left(Chars, 2) = "PH" then Char = "FF"
e. Replace H
with the previous letter if the previous or next letter is not a vowel.
<<Apply rules to one letter group>>= When Char = "H" then Do If Find(Substr(Source, Cursor-1, 1), Vowels) = 0 then , Char = Substr(Source, Cursor-1, 1) Else If Find(Substr(Chars, 2, 1), Vowels) = 0 then , Char = Substr(Chars, 2, 1) End
f. Replace W
with the previous letter if the it is a vowel.
<<Apply rules to one letter group>>= When Cursor = "W" then , If Find(Substr(Source, Cursor-1, 1),Vowels) > 0 then , Char = Substr(Source, Cursor-1, 1)
g. Append the resulting letter to the key if differs from the last letter.
<<Apply rules to one letter group>>= Otherwise Nop End /* Select */ Source = Left(Source, Cursor-1) || Char || Substr(Source, Cursor+1)
5. If the last letter is S
, remove it.
<<Remove a trailing S>>= If Right(Result, 1) = "S" then , Result = Left(Result, Length(Result)-1)
6. If the last letters are AY
, replace them with Y
.
<<Replace trailing AY with Y>>= If Right(Result, 2) = "AY" then , Result = Left(Result, Length(Result)-2) || "Y"
7. If the last letter is A
, remove it.
<<Remove trailing A>>= If Right(Result, 1) = "A" then , Result = Left(Result, Length(Result)-1)
8. Lastly, truncate the result to 10 characters in length.
<<Truncate to 10 characters>>= Result = Left(Result, 10)
As a function
With a little code to pull this all together into a nice function, we're done:
<<NYSIIS Function>>= /* ---------------------------------------------------------------------- */ /* code = NYSIIS(name) */ /* */ /* Compute and return the NYSIIS code corresponding to the specfied name. */ /* ---------------------------------------------------------------------- */ NYSIIS: Procedure Source = Arg(1) Define constants Translate first characters of name Translate last characters of name Accept first letter Translate remaining characters by rules Remove a trailing S Replace trailing AY with Y Remove trailing A Truncate to 10 characters Return Result
Main program for testing
The following main program will repeatedly prompt the user to supply a name and compute and display its NYSIIS equivalent until presented with an empty line.
<<nysiis_test.rexx>>= Do Forever Call LineOut , "Enter the name: " Name = LineIn() If Name = "" then Leave Say "NYSIIS value = " || NYSIIS(Name) End Exit 0 NYSIIS Function
Download code |