by James W. Thomas
Call-A.P.P.L.E. Magazine July 1985 PP15-18
The Apple Numerics Group has been working for several yeors to implement state-of- the-art numerics on all Apple computers. The result of these efforts is calted SANE, for Standard Apple Numerics Environment. It is available for Pascal and assembly programmers on Apple II and III computers, and is the native arithmetic on Macintosh. Apple Works, MacPascal, MacBASIC, the Lisa Workshop, and several other Macintosh languages and application programs use SANE.
This article, which identifies several of the problems in Applesoft arithmetic, was written 3 yoors ago by Jim Thomas while a student at the University of California, Berkeley. Jim is now the head of the Apple Numerics Group, which includes Kenton Hanson and Clayton Lewis. We are indebted to Clayton for bringing the article to our attention.
Many people are no longer surprised to find their computers executing:
S=0 FOR I = 1 to 1000 S = S + 0.1 NEXT PRINT S
and not printing 100. They realize that computer arithmetic is in some ways different from the arithmetic they learned in mathematics classes. The typical computer user will attempt to ignore these discrepancies, proceed as if dealing with ordinary (real) numbers, and then acknowledge that his results may contain some “insignificant” round-off error. The user may know that for some computations, a computer’s arithmetic fails to produce usable results. He may believe this happens only for contrived problems, or only for people with unusually bad luck. If his program fails to obtain results, or if he recognizes his results contain significant errors, he still may not suspect computer arithmetic as the culprit
The expert, the numerical analyst or the scientific programmer, is less naive. In his circles there is active concern about the manner in which computers do arithmetic, evidenced by the 1981 IEEE Proposed Standard for Binary Floating-Point arithmetic (which became IEEE Standard 754 in April 1985 and is the basis for SANE -Ed.) Partly because of the aforementioned attitudes among naive users, computer designers have exercised great liberty with the nature and quality of their machines’ arithmetics. The resulting peculiarities are many and varied (7). The more irregular the arithmetic, the more difficult becomes the programmer’s job in programming around trouble areas and in verifying program correctness. If portability of program is desired, the difficulty increases accordingly. (In light of growing software costs, these concerns are more than aesthetic.)
Of course arithmetic irregularities affect the non-expert’s programs too. Since he may not undertake an extensive proof of correctness, he may be spared the task of programming around trouble areas spotted in the analysis. Then, however, he may have results whose accuracy is eroded by the arithmetic to an extent completely unknown. On the other hand, if the arithmetic produces errors which do attract his attention, either in output or in other unintended program behavior, and if he suspects the computer arithmetic as the culprit, then the non-expert may still have enormous difficulty in debugging without acquiring a detailed knowledge of his machine’s particular arithmetic
In this article, we discuss some of the peculiarities of Apple II Applesoft BASIC, a floating-point arithmetic which is widely used by non-experts. In the section entitled Architecture, we summarize the architecture of the arithmetic; under Irregularities we present and discuss several behavioral irregularities; and under Implications we assess the implications the architecture and irregularities hold for the users.
Floating-point numbers are represented in memory using five bytes (40 bits). The first byte is an 8-bit exponent which includes a bias of 128. The high order bit of the second byte is the sign bit. The remaining 31 bits are used with an implicit high-order 1 bit to make a 32 bit significand. Representation is signed magnitude. Thus
8 1 31 exp s significandbits
represents (-1)^8 * 2^exp-128 * .1significandbits. Zero is represented whenever exp IS 0, regardless of the significand bits. Valid representations range from 2^-128 (actually 2^-128 is invalid in some cases-then 2^-127) to 2^127 * (1-2^-32), or roughly 10^-38 to 10^38. The 32 bit significands can distinguish between nine decimal digit significands.
While arithmetic operations are carried out, operands may have an additional guard byte, giving a significand of 40 bits. In addition and subtraction, both operands and the result use guard bytes. In multiplication, the multiplier and the product use guard bytes. In division, only the quotient has extra significance, but just 1 bit. When addition, subtraction, or multiplication is initiated, one operand has a 32 bit significand and the other a 40 bit significand. Although both operands in addition and subtraction use guard bytes, the operand with the 32 bit significand does so only if it has the smaller exponent, shifting its significand right. In formula evaluations involving several operations, the sizes of the significands depend on the order of the operations as well as on the operations themselves. Intermediate operands with 40 bit significands may be rounded to 32 bits and pushed onto a stack, or not, depending on the details of the formula
Rounding consists of inspecting the high order bit of the guard byte and, if a 1 is found , incrementing the higher order 32 bit significand. Thus rounding is to the nearest value with ties going away from zero. Numbers are normalized before rounding, except when an addition or subtraction cancels the high order 32 bits but leaves a non-zero guard byte. Then the result is set to zero. In addition to stack pushes by formula evaluation, memory stores also are preceded by rounding. Divisors are rounded before divisions occur. The guard byte has no sticky bit, so that any 1’s shifted out the right of the guard byte are lost.
Overflow results in an OVERFLOW ERROR message and termination of the program. Underflows are set to zero.
By irregularities we mean behavior of the arithmetic which the user should not be expected to anticipate. Such behavior may occur intentionally, because of design decisions, or unintentionally, because of bugs. Actually the distinction here is not clear-cut because design decisions often are made to live with presumably insignificant or tolerable bugs.
Formula-evaluation design flaw.
Intermediate operations may present different results to subsequent operations, depending on the positions of the operations in the formula.
Example (non-commutative addition):
]LIST 10 A = 2^(-2) + 2^(-3) + 2^(-33) 20 A=-A 30 B=2^(-1)+2^(-32) 40 C=2^(-1)+2^(-2) 50 PRINT A + B * C,B * C + A ]RUN 0 1.16415322E-10
This phenomenon occurs as a direct consequence of the design of the formula evaluation routine. For A + BC the result of BC stays in a floating-point accumulator, un-rounded and with guard byte . For BC+A the result of BC is rounded and stacked. Hence different numbers are added to A in the two cases. The following is similarly explained.
Example (non-commutative multiplication):
]LIST 10 A = 1-2^(-31) 20 B=2^(-33) 30 C=2^(-1) 40 PRINT C + (A + B) * C - A 50 PRINT C + C * (A + B) - A ]RUN 4.65661287E-10 2.32830644E-10
The remedy here would involve nontrivial design decisions. Two obvious approaches are i) including the guard byte on the stack and ii) rounding operands before operations occur. Alternative i) requires modification of the addition, subtraction, and multiplication routines which now use operands with different size significands.
In comparing two numbers for =, <, or>, the propagation of carries from a guard byte beyond the low order byte is not considered. Thus A op B = A op B may be false, since the formula evaluator rounds and pushes one A op B result onto the stack but leaves the other A op B result un-rounded in the floating-point accumulator.
Example (non-reflexive equality):
]LIST 10 A = 2^(- 1) 20 B = 2^(-24) - 2^(- 33) 30 IF A + B = A + B THEN PRINT "A+B=A+B" 40 IF A + B> A + B THEN PRINT "A+B>A+B" ]RUN A+B>A+B
Either remedy suggested above in the “Formula- evaluation design flaw” section would correct this flaw. A fix not involving the formula evaluator must compare a rounded number with a non-rounded number as if the latter were rounded. This appears to require some code.
If the exponent of a quotient is -128, then a positive quotient will result regardless of the signs of the divisor and dividend.
Example (A/2)/A = -.5):
]LIST 10 A = - (2^(-127)) 20 PRINT A,(A/2) / A ]RUN -5.87747176E-39 -.5
In division, the exponent of the divisor is negated and added to the exponent of the dividend. If the result including its bias is zero, then control branches to an underflow routine in order to produce a zero quotient. The zero exponent already indicates zero, so the underflow routine simply sets the sign to ” + ” . But then the return is back into the division routine which increments the exponent to account for a shift and carries out the division, so the result is no longer zero. It appears the underflow should not have been signaled, since the result is good except for the sign.
When two consecutive bytes of the multiplier are 0, the accumulated product bits to that point will be shifted one place too far to the right.
Example (1*A <> A):
]A = 2^(-1) + 2^(-24) - 2^(-31) ]B = 1*A ]CALL -151 *803.809 0803- 41 00 80 00 00 0808- 00 FE *80A.810 080A- 42 00 80 00 00 00 0810- 7F
Note that the low order byte FE of A becomes 7F in B = 1A, a shift of one bit to the right. The error can be as large as 127 units in the last place of the binary significand. With A as in the example A1 yields A, so the multiplication bug produces non-commutative multiplications. With a slight modification, the example violates A>1 -> A*B> B(B>0). This bug could be easily exterminated by resetting a carry which is cleared by a special routine which handles the zero byte multiplication.
Attempting a clearer demonstration of the effect of the multiplication bug, we continue with the preceding example.
]PRINT A,B : REM B=1*A .50000003 .500000015
But hand conversion of the binary representations shows that A and B differ by approximately 30 in the ninth decimal place. The effect of the multiplication bug is confounded by the following.
Binary -> decimal conversion error
]A = .500000059 ]CALL -151 *803.809 0803- 41 00 80 00 00 0808- 00 FD ]PRINT A .500000029
The internal binary representation of A is correct to all 32 binary places, so the decimal-to-binary conversion has done its job well. Since 2.32 < 2.4″10’10 the binary number in the example is nearer to .500000059 than to .500000058 or .500000060. Hence we might expect the conversion decimal-to-binary-to-decimal to be the identity on .500000059. However we obtain an error of 30 in the ninth decimal place! The conversion routines use the floating-point arithmetic and so are subject to the errors therein. Perhaps the multiplication bug, via multiplications in the binary-to- decimal conversion, causes the surprisingly large error above.
Trigonometric function error.
The sine function exhibits extremely poor accuracy near zero.
Example (sin(A)/A is not near 1 when A is small):
]LIST 10 FOR I = 1 TO 12 20 A = 10 ^ (- I) 30 PRINT A, SIN (A) / A 40 NEXT ]RUN
The sine function is identically zero for arguments greater than .5*10^10
Example ( sin(A) = 0 for large A ):
]LIST 10 FOR I = 5 TO 12 20 A = 10 ^ I 30 PRINT A, SIN ( A ) 40 NEXT ]RUN
1E + 10
The flaw in the sine routine lies in its argument reduction. The argument is divided by 2pi and F, the fractional part of the quotient, is obtained. As the first step of determining the quadrant, F is subtracted from 1/4 and the difference is the basis of all further calculations. If F is small, 1/4 – F has little or no significance. If the original argument is large, the fractional part of its quotient by 2pi is zero.
The cosine and tangent functions are obtained through the identities
cos(x) = sin(x + pi/2) tan(x) = sin(x)/cos(x)
and hence inherit the errors in the sine functions.
Example ( tan(A)/A<1):
]LIST 10 FOR I = 1 TO 12 20 A = 10 ^ (- I) 30 PRINT A, TAN (A) / A 40 NEXT ]RUN
Trigonometric identities are not reliable.
Example (sin(A)^2 + COS(A)^2 <> 1 ):
]LIST 10 FOR I = 5 TO 12 20 A = 10 ^ I 30 PRINT A, SIN(A)^2 + COS(A)^2 40 NEXT ]RUN
Example (sin(2A) <> 2sin(A)*cos(A)) :
]LIST 10 FOR I = 5 TO 12 20 A = 10 ^ I 30 PRINT SIN(2*A),2 * SIN(A) * COS(A) 40 NEXT ]RUN
Even if the sine’s argument reduction were repaired, the cosine would perform poorly for large arguments, x, because piJ2 would have little or no effect in x + piJ2. The simple repair for the trigonometric functions is to replace the module with an implementation of good existing algorithms.
The examples in the preceding section point to several areas in which the arithmetic’s behavior is other than a programmer would expect. This behavior is caused by subtle workings of the arithmetic routines (intentional or unintentional), the understanding of which would prove a heavy burden for the non-expert programmer.
• The programmer may have less accuracy than the 32 bit significance with rounding to nearest plus guard bytes for intermediate results would lead him to believe. The binary-to-decimal conversions and the multiplication bug may produce single operation errors of 30 in the ninth decimal place. The trigonometric functions may produce results with no significance. The guard byte provides extra accuracy only for rather special formulas which avoid the rounding stack pushes. The guard byte does not facilitate the significance one would hope for when addition or subtraction cancels the 32 high order bits, because such results are set to zero.
• The most innocent algebraic manipulations of formulas may alter a program’s results. Virtually every axiom of real numbers can fail (maybe not 0 + x = x). The compilation of a usable set of rules on which a programmer could depend would be very difficult
• Branch tests may be difficult to control. The sign–()f-small-quotients bug may reverse a sign. The formula evaluator may produce a zero for an expression which elsewhere appears to be nonzero. The comparator-evaluator conspiracy may provide incorrect comparisons. The loss of accuracy mentioned above may affect comparisons. Thus the unwary programmer’s tests to separate cases or guard against invalid operands may be sabotaged.
Although the irregularities discussed in this paper appear simple to correct (with the possible exception of the formula evaluation flaw), Apple Computer Inc., will be reluctant to implement changes. The Applesoft BASIC was purchased from Microsoft and no one with Apple is truly familiar with the details of the code. The code has little modularity, so it is difficult to assess the ramifications of changes. Multitudinous software exists for the current version and it would be difficult to check that these programs still run.
On the other hand, with the increased interest in arithmetic among the experts and with the increased knowledge of computing among the general population, the number of users who are aware of anomalies in the arithmetic can be expected to increase. More users will pursue the possibility that errors in their results are caused by computer arithmetic.
- Apple II Reference Manual. Apple Computer Inc., 1979.
- Applesoft BASIC Programming Reference Manual, Apple Computer Inc., 1978.
- IEEE Subcommittee p754, “A Proposed Standard for Binary Floating-Point Arithmetic,” Draft 8.0, Computer. 14, 3, March 1981, pp 53-62.
- Camp, Smay, Triska, Microprocessor Systems Engineering, Matrix Publishers Inc., 1979.
- Coonen, J.T., “Accurate, Economical Binary – Decimal Conversions,” May 1981.
- Kahan, W., “Error in Numerical Computation,” Course Notes, 1963.
- Kahan, W., “Why Do We Need A Floating-Point Arithmetic Standard?”, March 1981.
- Luebbart, W.F. , “What’s Where in the Apple,” MICRO – the 6502 Journal, August 1979.