



This is an attempt to explain floating point numbers in a crazy, but hopefully easy to understand way. Lets say that I have a number such as pi. pi is irrational, which means that it doesn't terminate or repeat. So we can't store the whole number (obviosly) so we do the next best thing, store as much as we can and know that the last part of it is going to be off by a little.
To make the discussion a little simpler, I'm going to assume that the computer stores numbers in decimal format (it actually uses binary, but the argument is analogous and techniques are exactly the same). I'm also going to assume that the floating point type has a precision of 10 decimal digits (this doesn't correspond to float or double or long double, but the idea is the same as long as you know the precision of the type you want to use, and replace the 10 here by that number).
Ok, the real value of pi is 3.14159265358..., but we can only store the first ten digits, so the computer sees it as 3.1415926536, notice how the last digit got rounded up. This is the closest number the computer can store to pi. It's a really good approximation, but it's not exactly right in the last digit. Every other digit is right, so I'm going to represent the number like this:
pi = 3.1415926536
I'm using green to say "these digits are accurate and in our safe zone." Red is obviously a "danger zone" since we can't be sure of the value (unless you know the floating point architecture very well, and further know exactly what your input should be and follow very precisely throughout all calculations. Usuallly it's a lot easier for the analysis to just assume this digit is complete garbage).
Ok, we have a floating point number, know where it can be wrong (the red zone) and can happily submit it as the return value as long as the green zone is large enough to satisfy TopCoder's precision requirements. What's so hard about this? Well, most problems don't just want you to return the input value. Instead you have to perform some operations on the number. What's so difficult about that? Well... operations typically change how big the red zone is.
There are many different operations you can perform on floating point numbers. You can do the for basic arithmetic operations +,,*,/ as well as other mathematical operations such as sin,acos,sqrt,pow etc. You can also perform casting with round,ceil,floor, etc. Since there are so many operations I won't talk about all of them, but I'll go over a few and hopefully you'll be able to figure out the rest yourself (perhaps with the exception of trig functions, since understanding them requires knowing the routines).
Ok, so the first operation is addition (subtraction is really similar). Lets say we want to add our value of pi to itself (after all 2pi does have certain geometrical applications). So we do the math:
3.1415926536 +3.1415926536 =6.2831853072 Here I want you to notice two things. First if you look at the addition without regard to what color the digits are, then it is exact. After all, the computer doesn't pay attention to what color the digits are, so why shouldn't it be? Second, if you pay attention to the addition only looking at the green zones, then it is not exact. The last green digit in each number is 3, and 3+3=6, so why is the corresponding digit in the answer 7? The answer is because error propagates. Above I labeled this digit as yellow. Because sometimes this digit will be wrong, and sometimes it won't. What determines if it's wrong? Well, in this case it was wrong because 6+6=12 > 9 and so the 1 was carried. In general addition can only propagate error by the most significant redzone digit causing a carry.
There's good news and bad news with this. The good news is that with each addition, you're sure that the red zone will increase by at most 1 digit. That's not very big, but the bad news is it can add up. The further good news is that it doesn't add up that fast. If a digit gets carried, then it must be a 1. This means that in the worst case it'll take at least 4 more addition operations until the error can get to the next digit. Even better is that if there are two numbers with different red zones, then the increase in error is on the order of the smaller red zone. And finally, if the numbers you are adding are random, then the error on the first number might be negative (that is the stored value is less than the theoretical value) and the error on the other number might be positive (stored is higher than theoretical) so that the errors cancel out.
Multiplication is a bit different, but works along the same way. Lets say we have two want to multiply pi by pi (less obvious why you'd want to do this, maybe you forgot the area formula):
3.1415926536 x3.1415926536 =9.8696044011 Multiplication is a bit harder to analyze. First of all, note that if we ignore color then the result is not exact. This is because the true multiplication would try to add more digits, and there just isn't enough room to store them all. Secondly note that if you were to multiply just the green digits, then the answer would not be the same. In fact lets compare the two numbers:
9.869604397 9.869604401 Notice that they aren't the same in the last three digits. But really, this is because the last digit is the one that is off and it just happened to cascade to the other digits (more on this phenomenon will be discussed in the rounding section). Here it might pay to be a little more exact about the terminology. error is the difference between the stored value and what the value theoretically should be. The red zone just represents the order of magnitude of the error, so although the digits are different in the green zone, that doesn't constitute error, because the difference between the theoretical and stored value is still small enough to fit in the red zone (Or in this case the yellow zone because yellow represents "may be red, may be green").
You have probably seen how to handle error in a high school science class. In these classes the green zone is called "signficant digits." And when we multiply two numbers the result has the same number of significant digits as the input number with fewer significant digits. The green zone works in exactly the same way, the result has the same sized green zone as the smaller of the two numbers being multiplied, with the caveat that the last digit becomes yellow. Notice that in this example the carry isn't limited to 1 as it is in addition, so error will propogate faster with multiplication than with addition. There's also less of the cancelling out effect, since larger errors tend to dominate. This is one reason why repeated multiplication with floating point numbers can tend to be error prone. 

Ok, now before the next operation of rounding, lets talk a bit more about what the green zone, red zone theory means about the philosophy of using floating point numbers. Lets say do a calculation and mathematically the end result should be exactly 1. That's all fine and dandy, but in the real world we'll end up with a number with a green zone and a red zone. Lets assume that the green zone has 7 digits. So the question is, what actual value can the stored data have? Well, the answer to the question is easy, because we can just put the highest and lowest values for the red zone:1.0000000999 to 0.9999999000 Namely, the stored number can never be more that 1e7 away from the value of 1. In fact this value, 1e7 is quite special. Special enough to give it a name, epsilon (the name comes from mathematicians practice of calling a small positive number by the same greek letter). Also note that the only thing special about the actual number "1e7" is that in this case the green zone happens to be 7 digits, if your green zone has a different size, then your epsilon should too (TopCoder usually wants you to have an epsilon of 1e9). Why is the epsilon important? Because we know from the green zone/red zone arguments the validity of this important fact:
true value  epsilon < stored value < true value + epsilon
This is just a restatement of what the possible stored values are. But really if you think about it, that question isn't the one we usually want to ask. In fact we know the stored value; just check the contents of the variable you're using. The question is, what is the true value? If you fiddle around with the previous inequalities and try to solve them for the true value, then you'll end up with this similar fact:
stored value value  epsilon < true value < stored value + epsilon
Now there are two notable things about this equation. The first is that the true value is never epsilon away from the stored value. This should be obvious because the stored value should be less than a small amount away from the true value. But the second thing is more important. If we know the stored value then we know a range of values that the true value lies in. We don't know the true value exactly, but we have the set of possible values it could be. This is the whole reason for using epsilon in floating point comparisions, because if the ranges of two variables overlap, then the two values could be equal.
Ok, now that that's through, back to the point of rounding. Let's say we want to round the variable x to the nearest integer less than or equal to x. For a simple first look, assume that x contains the exact value (before you realized that floating point computations were exact you assumed this exact same thing, so think of this as your code before learning the magic that is floating point). Well, you say, this is a simple task. This is just truncation, which is what casting to an integer does. (int) x is the desired value! Well, this is not quite so. Why? look above. If the true value of x is exactly 1 then x could be as low as 0.9999999000. When this number gets truncated it's going to go straight down to 0, be dead wrong, and get you that exact score in the SRM. The cure is plain and simple, just add epsilon to x before you do the cast. This way if x is anywhere near an actual integer(and so we must assume that the true value of x is that integer), then it will be gauranteed to get rounded to the right place. On the other hand if x is not close to any integer, then adding epsilon won't change the result. 

Thank you for the great explanation Ryan! So if you round down you should do something like (int)(x+1e9). If you round up you do (int)(x1e9). Is this correct?
How do you know what epsilon will be? Consider my previous example (see other post). In 3^35, how do you know that the answer will be off by 3?
This made me think a bit more. So green zone means the digit is 100% correct, red means there is 50% its wrong. I guess yellow means that there is 25% its wrong. I wonder if there is a hardware architecture that lets you extract this information quickly? Ie. I want a function like accuracy(digit) that tells me what the % accuracy that digit has. 

Well, casting something to an int rounds it down, so if you're doing that, you have to guard against rounding a 1.999999999999 that's really a 2epsilon to 1 instead. So you'd use (int)(x+1e9). The time you'd either add or subtract is if you wanted to check equality  if you wanted something like "if (x == y)", it's safer to say "if (x+1e9 > y && x1e9 < y)" or "if (abs(xy) < 1e9)". And it's even safer to do relative error like I think misof explained in the article.
The reason 3^35 is off likely has to do with storage space. 3^35 is 50031545098999707, 17 digits in decimal, or 56 digits in binary. The mantissa of a double is 52 binary digits (53 counting the implied 1), which is 3 digits too few. What that means is that even if exponentation is one operation, and the resulting answer is the absolute closest a double can be to the right answer, the last 3 binary digits are going to be cut off, so the answer will be wrong unless the last 3 digits are 0 in the real answer (which they aren't, because powers of 3 aren't going to be multiples of 8). And this is assuming you have one red digit like the starting value of PI that Ryan uses as an example.
I'm not sure green and yellow are importantly different  even a green digit could potentially be wrong if the answer is close to certain 'boundaries', which is why we look at answers we get like 1.99999999998 for problems and understand that's the same as the correct answer of 2.0 (even though all the 'green digits' are wrong, only the last 2 digits are likely real 'red digits'). I'm not sure if there is a computing architecture which remembers how many signficant digits it has, but it sounds possible in some convention or another. 

(53 counting the implied 1)
Thanks for putting that bit in. I'd forgotten about the implied 1 and had been trying to figure out how 3^{35} could have been only 3 off. Now I know. 

Sorry I got that wrong. Should rounding up be (int)(x+1+1e9)? Rounding to the nearest integer  (int)(x+0.5+1e9)? 

rounding to nearest is correct (if you want always round .5 up semantics). 

Are you sure that's correct? What if the number you're trying to round up is exactly 2, then rounding it up should still give 2 since no rounding is necessary. But (int)(x + 1 + 1e9) would give 3 because you've added more than 1 to it and then rounded it down. 

Hmmm... maybe it should be (int)(x+11e9)
(int)(1.999... +11e9) gives 2 (int)(2.000...1 +11e9) gives 2. Or does it? I am confused! 

Yeah, you're right. I guess I should add 9:15 PM to the list of times when I shouldn't post. The correct form of the ceiling operation is:
(int)(x+1epsilon)


> I'm not sure if there is a computing architecture which remembers how many signficant digits it has, but it sounds possible in some convention or another.
Some CPUs (Intel and compatibles) will signal precision loss, e.g when subtracting two almost equal floats (where the result is almost complete garbage). But that info is typically not (or not easily) available to high level programming. There you may implement arithmetic on intervals if You want to keep track of actual error. 
