
This is an attempt to explain floating point numbers in a crazy, but hopefully easy to understand way. Lets say that I have a number such as pi. pi is irrational, which means that it doesn't terminate or repeat. So we can't store the whole number (obviosly) so we do the next best thing, store as much as we can and know that the last part of it is going to be off by a little.
To make the discussion a little simpler, I'm going to assume that the computer stores numbers in decimal format (it actually uses binary, but the argument is analogous and techniques are exactly the same). I'm also going to assume that the floating point type has a precision of 10 decimal digits (this doesn't correspond to float or double or long double, but the idea is the same as long as you know the precision of the type you want to use, and replace the 10 here by that number).
Ok, the real value of pi is 3.14159265358..., but we can only store the first ten digits, so the computer sees it as 3.1415926536, notice how the last digit got rounded up. This is the closest number the computer can store to pi. It's a really good approximation, but it's not exactly right in the last digit. Every other digit is right, so I'm going to represent the number like this:
pi = 3.1415926536
I'm using green to say "these digits are accurate and in our safe zone." Red is obviously a "danger zone" since we can't be sure of the value (unless you know the floating point architecture very well, and further know exactly what your input should be and follow very precisely throughout all calculations. Usuallly it's a lot easier for the analysis to just assume this digit is complete garbage).
Ok, we have a floating point number, know where it can be wrong (the red zone) and can happily submit it as the return value as long as the green zone is large enough to satisfy TopCoder's precision requirements. What's so hard about this? Well, most problems don't just want you to return the input value. Instead you have to perform some operations on the number. What's so difficult about that? Well... operations typically change how big the red zone is.
There are many different operations you can perform on floating point numbers. You can do the for basic arithmetic operations +,,*,/ as well as other mathematical operations such as sin,acos,sqrt,pow etc. You can also perform casting with round,ceil,floor, etc. Since there are so many operations I won't talk about all of them, but I'll go over a few and hopefully you'll be able to figure out the rest yourself (perhaps with the exception of trig functions, since understanding them requires knowing the routines).
Ok, so the first operation is addition (subtraction is really similar). Lets say we want to add our value of pi to itself (after all 2pi does have certain geometrical applications). So we do the math:
3.1415926536 +3.1415926536 =6.2831853072 Here I want you to notice two things. First if you look at the addition without regard to what color the digits are, then it is exact. After all, the computer doesn't pay attention to what color the digits are, so why shouldn't it be? Second, if you pay attention to the addition only looking at the green zones, then it is not exact. The last green digit in each number is 3, and 3+3=6, so why is the corresponding digit in the answer 7? The answer is because error propagates. Above I labeled this digit as yellow. Because sometimes this digit will be wrong, and sometimes it won't. What determines if it's wrong? Well, in this case it was wrong because 6+6=12 > 9 and so the 1 was carried. In general addition can only propagate error by the most significant redzone digit causing a carry.
There's good news and bad news with this. The good news is that with each addition, you're sure that the red zone will increase by at most 1 digit. That's not very big, but the bad news is it can add up. The further good news is that it doesn't add up that fast. If a digit gets carried, then it must be a 1. This means that in the worst case it'll take at least 4 more addition operations until the error can get to the next digit. Even better is that if there are two numbers with different red zones, then the increase in error is on the order of the smaller red zone. And finally, if the numbers you are adding are random, then the error on the first number might be negative (that is the stored value is less than the theoretical value) and the error on the other number might be positive (stored is higher than theoretical) so that the errors cancel out.
Multiplication is a bit different, but works along the same way. Lets say we have two want to multiply pi by pi (less obvious why you'd want to do this, maybe you forgot the area formula):
3.1415926536 x3.1415926536 =9.8696044011 Multiplication is a bit harder to analyze. First of all, note that if we ignore color then the result is not exact. This is because the true multiplication would try to add more digits, and there just isn't enough room to store them all. Secondly note that if you were to multiply just the green digits, then the answer would not be the same. In fact lets compare the two numbers:
9.869604397 9.869604401 Notice that they aren't the same in the last three digits. But really, this is because the last digit is the one that is off and it just happened to cascade to the other digits (more on this phenomenon will be discussed in the rounding section). Here it might pay to be a little more exact about the terminology. error is the difference between the stored value and what the value theoretically should be. The red zone just represents the order of magnitude of the error, so although the digits are different in the green zone, that doesn't constitute error, because the difference between the theoretical and stored value is still small enough to fit in the red zone (Or in this case the yellow zone because yellow represents "may be red, may be green").
You have probably seen how to handle error in a high school science class. In these classes the green zone is called "signficant digits." And when we multiply two numbers the result has the same number of significant digits as the input number with fewer significant digits. The green zone works in exactly the same way, the result has the same sized green zone as the smaller of the two numbers being multiplied, with the caveat that the last digit becomes yellow. Notice that in this example the carry isn't limited to 1 as it is in addition, so error will propogate faster with multiplication than with addition. There's also less of the cancelling out effect, since larger errors tend to dominate. This is one reason why repeated multiplication with floating point numbers can tend to be error prone. 