3.4 Rounding versus truncating

The right shift method by itself is a truncation operator. The least significant four bits are simply gone. This means the value 1.111100002 and the value 1.111111112 are both truncated to the same represented value of 1.11112. This appears to be okay, but it has very bad effect “in the long run”.

If we carry out multiplications several times, or add the products of several terms, then the effect of truncation will become apparent. The computed values will becomes smaller and smaller compared to the actual value. This is because truncation biases to a smaller value.

Let us rethink this problem. 1.111100002 should definitely be rounded to 1.11112. However, 1.111111112 is much closer to 10.00002 than 1.11112. As a result, it makes sense to round 1.111111112 to 10.00002. We can, then, generalize and say that we round a number to a less precise representation based on whether it is closer to the smaller value or the larger value.

There is one problem left. What about 1.111110002? It is actually exactly half way beteween 1.111100002 and 10.00002. What we need to consider in this case is: how many values are rounded to 1.11112, and how many values are rounded to 10.11112? The two numbers should be the same.

Because 1.111100002 is “rounded” to 1.11112, this means we have 1.111100002,1.111101112 rounded to 1.11112. That makes 8 distinct values. It makes sense, then, to round 1.111110002 to 10.00002 so that all values 1.111110002,1.111111112 (8 of them) are arounded to 10.00002.

Rounding is not difficult, we only need to add 10002 to z = xy before the right shift operation. In other words, we want to make z = rs((xy + 10002),4).