Convert integer to words (QBASIC)

From LiteratePrograms

Jump to: navigation, search
Other implementations: Java | QBASIC

Here is the function for changing a number into words. It works for English, for US English and for French. So far it doesn't work for other languages. Also please note that my French is pretty rusty, so I don't guarantee that the current algorithm produces valid French in all cases. The assistance of a fluent French speaker to add more test cases would be very much appreciated. Luckily such a person has added some test cases to the program and some helpful observations to the discussion page. The issues raised will be dealt with shortly.

The function is split into a "run once" initialisation portion followed by the part which calculates the actual words required to do the job. These portions will be discussed in a little more detail later in the article.

<<definition>>=
FUNCTION Num2Lang$ (aNumber AS LONG, aLang AS STRING)
variable declarations
initialisation call
function body
initialisation block
END FUNCTION

Two types of scope are used for the variables in this function. DIMensioned variables are automatically reinitialised to their default values whenever the function is called. STATIC variables retain their value between calls to the function. They are not strictly necessary in this function but their use makes the function faster by avoiding the need to set up the vocabulary arrays each time the function is called.

<<variable declarations>>=
   STATIC dLang AS STRING, dLog10 AS DOUBLE
   STATIC dUnits() AS STRING, dTens() AS STRING, dPowers() AS STRING
   DIM dBuffer AS STRING, dDigitGroup AS LONG
   DIM dTensGroup AS LONG, dPowersGroup AS LONG
   DIM dRange AS INTEGER

Note that QBASIC doesn't allow for the sizing of a static array at the time of its declaration. This has to be done later.

Basically the vocabulary arrays are loaded with data before use on the first call to the function and then just used on subsequent calls. The only case in which these arrays will need to be reloaded is when the language used is changed.

Finally, if you have missgivings about the use of the GOSUB command in this piece of code please read on. Its use is discussed at a later point in this article.

<<initialisation call>>=
   IF dLang <> aLang THEN
      GOSUB Num2LangInit
      dLang = aLang
   END IF

After the soup and salad we come to the meat of the function. In the two (and a half) languages covered so far we only really need to deal with three cases: zero to nineteen; twenty to ninety-nine; everything else. Since the specification for this task (the bottles of beer song) implies that we only need to deal with positive whole numbers and since I have arbitrarily decided that I only want to deal with 32-bit signed integers (the LONG type in BASIC) "everything else" implies any whole number between one hundred and 231-1. It would be trivial to handle negative numbers and not too much work to handle decimals but there's no need in this case. However with an eye to bugs or future expansion a fourth case has been added to handle numbers outside the range.

<<function body>>=
   SELECT CASE aNumber
first case
second case
third case
other cases
   END SELECT
   Num2Lang = dBuffer
   EXIT FUNCTION

The first case contains a highly idiosyncratic set of numbers with little or no pattern and the easiest way to handle it is via the pre-initialised lookup table, dUnits.

<<first case>>=
   CASE 0 TO 19
      dBuffer = dUnits(INT(aNumber))

The second case is pretty straightforward for English but handling French adds three minor complications. Firstly, some ten-words, "septante" for instance, don't exist in Parisian French, although they do in other dialects such as Belgian French. In those cases the base twenty system has to be used starting from the previous existing ten-word. Hence the first IF/ENDIF section in the following code. Secondly, "quatre-vingts" doesn't need an "s" at the end when followed by units and otherwise needs to be treated differently from the "-ante" words. Thirdly, umpty-one values have to be treated specially by adding the word "et" in between the tens and the units. Hence the second IF/ENDIF section and its internal cases.


<<second case>>=
   CASE 20 TO 99
      dTensGroup = INT(aNumber / 10)
      dBuffer = dTens(dTensGroup)
      IF dBuffer = "" THEN
         dTensGroup = dTensGroup - 1
         dBuffer = dTens(dTensGroup)
      END IF
      dDigitGroup = aNumber - dTensGroup * 10
      IF dDigitGroup > 0 THEN
         IF LEFT$(aLang, 2) = "fr" AND RIGHT$(dBuffer, 1) = "s" THEN
            dBuffer = LEFT$(dBuffer, LEN(dBuffer) - 1)
         END IF
         IF LEFT$(aLang, 2) <> "fr" OR dDigitGroup MOD 10 <> 1 THEN
            dBuffer = dBuffer + "-"
         ELSEIF aLang = "fr" AND dTensGroup = 8 THEN
            dBuffer = dBuffer + "-"
         ELSE
            dBuffer = dBuffer + " et "
         END IF
         dBuffer = dBuffer + Num2Lang(dDigitGroup, aLang)
      END IF

The third case has the most complex code The basic idea is to identify which range the number falls into (thousands, millions, etc.) then use recursive calls to get the text for groups of three digits. That simple picture is clouded a little by the first range, the hundreds, which are treated a little differently in US English from other English variants. It's also complicated by French which uses the phrases "cent" and "mille" rather than "un cent" or "un mille" for 100 and 1,000 and has rules on when to use plurals for powers of ten and when not to.

Also note the addition of .4 to the number when calculating the range. This shouldn't have been necessary but a floating point approximation error leads to the wrong value being calculated for 100 if it isn't present. The calculation worked for all other values, even without the addition but them's the breaks.

<<third case>>=
   CASE 100 TO 2147483647
      dRange = INT(LOG(aNumber + .4) / dLog10)
      IF dRange > 3 THEN dRange = INT(dRange / 3) * 3
      dPowersGroup = INT(aNumber / 10 ^ dRange)
      IF aLang = "fr" AND dPowersGroup = 1 AND dRange < 5 THEN
         dBuffer = ""
      ELSE
         dBuffer = Num2Lang(dPowersGroup, aLang)
      END IF
      dBuffer = LTRIM$(dBuffer + dPowers(dRange))
      dDigitGroup = aNumber - dPowersGroup * 10 ^ dRange
      IF LEFT$(aLang, 2) = "fr" AND (dPowersGroup = 1 OR dDigitGroup > 0) THEN
         IF RIGHT$(dBuffer, 1) = "s" THEN
            dBuffer = LEFT$(dBuffer, LEN(dBuffer) - 1)
         END IF
      END IF
      IF dDigitGroup > 0 THEN
         dBuffer = dBuffer + " "
         IF dDigitGroup < 100 AND aLang = "en-uk" THEN
            dBuffer = dBuffer + "and "
         END IF
         dBuffer = dBuffer + Num2Lang(dDigitGroup, aLang)
      END IF

Finally a default case was added during development to handle cases which hadn't been handled yet. If the code is extended to handle negative or floating point numbers at some time in the future this might come in handy again, so it has been left. At the moment it will catch negative numbers and produce a "reasonable" answer which will at least indicate that there is a problem in the input.

<<other cases>>=
   CASE ELSE
      dBuffer = LTRIM$(STR$(aNumber))

Now we have the initialisation code for the function. It basically loads arrays with the vocabulary required for the current language. It also sets the LOG10 constant. This is required because QBASIC's built-in LOG functon deals in natural logarithms and we actually need base 10 logarithms to identify the right powers-of-ten word.

Just a word on the use of GOSUB and a label here. Many people recoil in horror from the GOSUB command nowadays with some vague fear that it is the GOTO command in disguise and that therefore its use is "unstructured". In fact it has been removed altogether from the latest incarnation of BASIC, VB.NET and that is a pity. There is no doubt that GOSUB in the wrong hands can be misused badly. However it has at least one legitimate use and that use is the provision of structuring within a function or subroutine where the creation of extra functions or subroutines to carry out that structuring would be overkill. That is how it has been used here. While it could have been replaced altogether in this function, its use makes the code more readable than it would otherwise have been and thus its use is justified.

<<initialisation block>>=
Num2LangInit:
   REDIM dUnits(19), dTens(9), dPowers(9)
   SELECT CASE LEFT$(aLang, 2)
   CASE "fr"
      dUnits(0) = "zero": dUnits(10) = "dix": dTens(0) = "": dPowers(0) = ""
      dUnits(1) = "un": dUnits(11) = "onze": dTens(1) = "": dPowers(1) = ""
      dUnits(2) = "deux": dUnits(12) = "douze": dTens(2) = "vingt": dPowers(2) = " cents"
      dUnits(3) = "trois": dUnits(13) = "treize": dTens(3) = "trente": dPowers(3) = " mille"
      dUnits(4) = "quatre": dUnits(14) = "quatorze": dTens(4) = "quarante": dPowers(4) = ""
      dUnits(5) = "cinq": dUnits(15) = "quinze": dTens(5) = "cinquante": dPowers(5) = ""
      dUnits(6) = "six": dUnits(16) = "seize": dTens(6) = "soixante": dPowers(6) = " millions"
      dUnits(7) = "sept": dUnits(17) = "dix-sept": dTens(7) = "": dPowers(7) = ""
      dUnits(8) = "huit": dUnits(18) = "dix-huit": dTens(8) = "quatre-vingts": dPowers(8) = ""
      dUnits(9) = "neuf": dUnits(19) = "dix-neuf": dTens(9) = "": dPowers(9) = " milliards"
   CASE "en"
      dUnits(0) = "zero": dUnits(10) = "ten": dTens(0) = "": dPowers(0) = ""
      dUnits(1) = "one": dUnits(11) = "eleven": dTens(1) = "": dPowers(1) = ""
      dUnits(2) = "two": dUnits(12) = "twelve": dTens(2) = "twenty": dPowers(2) = " hundred"
      dUnits(3) = "three": dUnits(13) = "thirteen": dTens(3) = "thirty": dPowers(3) = " thousand"
      dUnits(4) = "four": dUnits(14) = "fourteen": dTens(4) = "forty": dPowers(4) = ""
      dUnits(5) = "five": dUnits(15) = "fifteen": dTens(5) = "fifty": dPowers(5) = ""
      dUnits(6) = "six": dUnits(16) = "sixteen": dTens(6) = "sixty": dPowers(6) = " million"
      dUnits(7) = "seven": dUnits(17) = "seventeen": dTens(7) = "seventy": dPowers(7) = ""
      dUnits(8) = "eight": dUnits(18) = "eighteen": dTens(8) = "eighty": dPowers(8) = ""
      dUnits(9) = "nine": dUnits(19) = "nineteen": dTens(9) = "ninety": dPowers(9) = " billion"
   CASE ELSE
      dUnits(0) = "0": dUnits(10) = "0": dTens(0) = "": dPowers(0) = ""
      dUnits(1) = "1": dUnits(11) = "1": dTens(1) = "1": dPowers(1) = ""
      dUnits(2) = "2": dUnits(12) = "2": dTens(2) = "2": dPowers(2) = ""
      dUnits(3) = "3": dUnits(13) = "3": dTens(3) = "3": dPowers(3) = ""
      dUnits(4) = "4": dUnits(14) = "4": dTens(4) = "4": dPowers(4) = ""
      dUnits(5) = "5": dUnits(15) = "5": dTens(5) = "5": dPowers(5) = ""
      dUnits(6) = "6": dUnits(16) = "6": dTens(6) = "6": dPowers(6) = ""
      dUnits(7) = "7": dUnits(17) = "7": dTens(7) = "7": dPowers(7) = ""
      dUnits(8) = "8": dUnits(18) = "8": dTens(8) = "8": dPowers(8) = ""
      dUnits(9) = "9": dUnits(19) = "9": dTens(9) = "9": dPowers(9) = ""
   END SELECT
   SELECT CASE LEFT$(aLang, 5)
   CASE "fr-be"
      dTens(7) = "septante"
      dTens(8) = "octante"
      dTens(9) = "nonante"
   CASE "fr-ch"
      dTens(7) = "septante"
      dTens(8) = "huitante"
      dTens(9) = "nonante"
   CASE ELSE
      REM Do nothing
   END SELECT
   dLog10 = LOG(10)
   RETURN

The next piece of the file is a scaffold which you can use to test the Num2Lang function. When there are so many ways that things can go wrong, it's important to automate the testing process so that the same tests are run every time.

The floating point approximation error discussed above demonstrates the need for comprehensive testing. There was no logical error in the code before the "+ .4" was added to it. Nevertheless the function did not return the correct result when the input value was 100, so the cause had to be identified and a workaround created. Comprehensive unit testing will find this sort of error where logic and code writing skills will not.

Note: the tests results are now formatted according to the TAP format (see http://en.wikipedia.org/wiki/Test_Anything_Protocol).

<<unit tests>>=
DECLARE FUNCTION Num2Lang$ (aNumber AS LONG, aLang AS STRING)
DIM mTestCount AS INTEGER
DIM mStatus AS STRING
DIM mTest AS STRING
DIM mLang AS STRING
DIM mNumber AS LONG
DIM mExpected AS STRING
DIM mGot AS STRING
DIM mDelay AS SINGLE
DIM mTimer AS SINGLE
mTestCount = 0
RESTORE TestCases
DO
   READ mTest
   IF mTest = "" THEN
      EXIT DO
   ELSE
      READ mLang, mNumber, mExpected
      mTestCount = mTestCount + 1
   END IF
LOOP
CLS
PRINT "1.." + LTRIM$(STR$(mTestCount))
mTestCount = 0
RESTORE TestCases
mStatus = ""
DO WHILE INKEY$ <> CHR$(27)
   READ mTest
   IF mTest = "" THEN
      EXIT DO
   ELSE
      mTestCount = mTestCount + 1
      READ mLang, mNumber, mExpected
      mGot = Num2Lang$(mNumber, mLang)
      mStatus = "ok" + STR$(mTestCount)
      IF mExpected <> mGot THEN
         mStatus = "not " + mStatus
      END IF
      mStatus = mStatus + " - " + LEFT$(mLang + ": " + LTRIM$(STR$(mNumber)) + SPACE$(15), 15) + "'" + mGot + "'"
      IF mExpected <> mGot THEN
         mStatus = mStatus + " (Expected '" + mExpected + "')"
      END IF
      PRINT mStatus
      IF mExpected = mGot THEN
         mDelay = .2
      ELSE
         mDelay = 2
      END IF
      mTimer = INT(TIMER * 10)
      DO WHILE mDelay > 0
         IF mTimer <> INT(TIMER * 10) THEN
            mTimer = INT(TIMER * 10)
            mDelay = mDelay - .1
         END IF
      LOOP
   END IF
LOOP
SYSTEM
TestCases:
   DATA "*","en-uk",0,"zero"
   DATA "*","en-uk",1,"one"
   DATA "*","en-uk",9,"nine"
   DATA "*","en-uk",10,"ten"
   DATA "*","en-uk",11,"eleven"
   DATA "*","en-uk",19,"nineteen"
   DATA "*","en-uk",20,"twenty"
   DATA "*","en-uk",21,"twenty-one"
   DATA "*","en-uk",100,"one hundred"
   DATA "*","en-uk",101,"one hundred and one"
   DATA "*","en-us",101,"one hundred one"
   DATA "*","en-uk",1000,"one thousand"
   DATA "*","en-uk",1001,"one thousand and one"
   DATA "*","en-uk",1958,"one thousand nine hundred and fifty-eight"
   DATA "*","fr",10,"dix"
   DATA "*","fr",11,"onze"
   DATA "*","fr",21,"vingt et un"
   DATA "*","fr",22,"vingt-deux"
   DATA "*","fr",29,"vingt-neuf"
   DATA "*","fr",60,"soixante"
   DATA "*","fr",61,"soixante et un"
   DATA "*","fr",62,"soixante-deux"
   DATA "*","fr",70,"soixante-dix"
   DATA "*","fr-be",70,"septante"
   DATA "*","fr-be",71,"septante et un"
   DATA "*","fr",71,"soixante et onze"
   DATA "*","fr",79,"soixante-dix-neuf"
   DATA "*","fr",80,"quatre-vingts"
   DATA "*","fr-be",80,"octante"
   DATA "*","fr-ch",80,"huitante"
   DATA "*","fr",81,"quatre-vingt-un"
   DATA "*","fr-be",81,"octante et un"
   DATA "*","fr",82,"quatre-vingt-deux"
   DATA "*","fr",90,"quatre-vingt-dix"
   DATA "*","fr-be",90,"nonante"
   DATA "*","fr",99,"quatre-vingt-dix-neuf"
   DATA "*","fr",100,"cent"
   DATA "*","fr",101,"cent un"
   DATA "*","fr",900,"neuf cents"
   DATA "*","fr",999,"neuf cent quatre-vingt-dix-neuf"
   DATA "*","fr",1000,"mille"
   DATA "*","fr",1100,"mille cent"
   DATA "*","fr",100000,"cent mille"
   DATA "*","fr",200000,"deux cents mille"
   DATA "*","fr",200025,"deux cents mille vingt-cinq"
   DATA "*","fr",1000000,"un million"
   DATA "*","fr",1000100,"un million cent"
   DATA "*","fr",2000000,"deux millions"
   DATA "*","fr",2003201,"deux million trois mille deux cent un"
   DATA "*","fr",2000000000,"deux milliards"
   DATA "*","fr",2000003201,"deux milliard trois mille deux cent un"
   DATA "*","fr",300,"trois cents"
   DATA "*","fr",301,"trois cent un"
   DATA ""
<<NUM2LANG.BAS>>=
unit tests
definition
Download code
Views