JOIN
Get Time
forums   
Search | Watch Thread  |  My Post History  |  My Watches  |  User Settings
View: Flat (newest first)  | Threaded  | Tree
Previous Thread  |  Next Thread
1.3. Choosing Correct Numeric Data Types | Reply
This is a rewrite of this recipe and this rewrite of it.
I couldn't find a simple problem which would require comparing doubles with certain tolerance (which is a must for this recipe) - any ideas?

Problem

Absolute majority of TopCoder problems involve certain calculations. In some cases an otherwise correct solution fails due to minor bugs in their way of processing numbers. How to learn to recognize such cases and avoid these bugs?

Solution

All numeric errors originate from the simple fact that our mind operates with abstract numbers, and our compilers handle machine implementations of numbers. Two main reasons of numeric errors are overflow and floating-point imprecision.

Overflow happens when the result of calculations doesn't fit within the datatype it has. Overflow usually takes place in integer calculations (in doubles this is exceedingly rare), and is cured by using long type instead of common integer type.

Floating-point imprecision is caused by specifics of floating point numbers representation, and happens when doubles are compared directly (without epsilon tolerance). This is cured either by avoiding doubles in calculations or by being extra careful when using and especially comparing them.

Discussion

Here are some very basic tips on how to identify places where tricky numeric calculations can happen:

- examine data types of input and output variables. Evident but still useful - if the method's return is of type long or double, you'll naturally have to use it at some point of calculations.

- learn by heart the minimum and maximum values that can be stored in numeric data types in your language. No need to be super precise, it's enough to remember that absolute value of these limits for signed 32-bit integer is approximately 2*109, and for signed 64-bit nteger (usually referred to as long) - approximately 9*1018. Each time the intermediary results of your calculations are close to the limits, take care about overflow.

- remember to keep trace of calculations you perform during your solution. Some operations require explicit casting to another data type to behave in a way you mean them to. These are multiplication which can overflow, and division which, when applied to integer numbers, produces an integer result instead of intuitive floating-point one.

- note that while C++ has unsigned integer data types, Java has none. Since reference solutions to all problems are written in Java, you are guaranteed to be able to solve each problem without unsigned types. However, in some cases they are handy for storing bitmasks or using them to simplify avoiding overflow.

- don't use 8-bit and 16-bit integers and 32-bit floating-point numbers, unless you're sure you need them. They won't increase processing speed, but can lead to unexpected overflow/precision errors. Use them only if memory consumption is really critical.

Now, let's have a look at some examples of problems in which you have to take care of data types and the ways in which this can be done.

FeudaliasBattle (multiplication overflow + integer division)
This lovely problem gives one two chances to fail due to misuse of data type conversions. Let's first have a look at the correct submission (the one suggested in the editorial) in C++:
#include <vector>
#include <math.h>
 
using namespace std;
 
class FeudaliasBattle {
	public:
	double getMinimumTime(vector <int> baseX, vector <int> baseY, 
			vector <int> siloX, vector <int> siloY, 
			int takeOffTime, int rechargeTime, int missileSpeed)  {
		int i,j,k;
		double silobase[4][2],dx,dy;
		for (i=0; i<2; i++)	//index of silo
		for (j=0; j<2; j++)	//number of recharges till this shot
		for (k=0; k<2; k++)	//index of base
		{	dx = baseX[k]-siloX[i];
			dy = baseY[k]-siloY[i];
			silobase[i+2*j][k] = takeOffTime*(j+1)/60.0 + 
				rechargeTime*j + sqrt(dx*dx+dy*dy)/missileSpeed;
		}
		double ret=1e10;
		for (i=0; i<4; i++)
		for (j=0; j<4; j++)
			if (i!=j)
				ret <?= max(silobase[i][0], silobase[j][1]);
		return ret;
	}
};


Now, note the way the Euclidean distance between points is calculated: though both dx and dy are integer numbers, they can be as large as 106, and dx*dx will overflow 32-bit integer. Trying to retrieve square root of negative number (the result of overflowing integer) results in NaN return and fails the test {{0, 0}, {0, 1}, {1000000, 1000000}, {999999, 1000000}, 30, 10, 1}. The cure is to declare dx and dy as 64-bit integers or doubles, so that dx*dx is within the boundaries of this data type.

Another way to fail this problem is to divide takeOffTime*(j+1) not by 60.0 but by 60. Since both dividend and divisor are integer, the result will be integer as well, so each time takeOffTime*(j+1) is not divisible by 60, the result will be slightly smaller than the correct answer. In this problem, however, this bug didn't affect solutions, since it fails the very first example.

Multifactorial (multiplication overflow)
This is an old problem that required inventing a safe way to detect overflow during multifactorial calculation. The ways of solving it varied from language to language. Here are a few of them.

C# provides a keyword "checked" which allows to throw an exception if an overflow occurred. This way one could simply catch this exception and return "overflow".

public class Multifactorial {
	public string calcMultiFact(int n, int k) {
		long mf = 1, mx = 1000000000000000000L;
		for (; n>0; n-=k)
		{	try { mf = checked(mf*n); }
			catch (System.OverflowException e) 
			{	return "overflow"; }
		}
		if (mf>mx)
			return "overflow";
		return mf.ToString();
	}
}


Java coders could use BigInteger class to avoid technical overflow issues completely. However, they still would have to do the check for logical overflow on each step, since operations on BigInteger are slow, and test case { 2000000000, 1 } would require about 2*109 multiplications.

import java.math.*;
public class Multifactorial {
	public String calcMultiFact(int ni, int ki) {
		BigInteger mf = BigInteger.ONE, mx = new BigInteger("1000000000000000000");
		BigInteger n = new BigInteger(ni+""), k = new BigInteger(ki+"");
		for (; n.compareTo(BigInteger.ZERO)>0; n = n.subtract(k))
		{	mf = mf.multiply(n);
			if (mf.compareTo(mx)>0)
				return "overflow";
		}
		return mf.toString();
	}
}


Finally, a universal approach (demonstrated in Java) required manual check for multiplication overflow. This can be done by checking whether maximum value divided by one of multipliers is less than the other
multiplier (if it is, product does overflow). Alternatively one can use doubles to store numbers larger than 263, but this might result in undesired loss of precision.

public class Multifactorial {
	public String calcMultiFact(int n, int k) {
		long mf = 1, mx = 1000000000000000000L;
		for (; n>0; n-=k)
		{	if (mx/n<mf)
			 	return "overflow";
			mf *= n;
		}
		if (mf>mx)
			return "overflow";
		return mf+"";
	}
}
Re: 1.3. Choosing Correct Numeric Data Types (response to post by Nickolas) | Reply
Discussion continued

FindTriangle (floating-point imprecision)

The evident approach for this problem is to use Heron's formula for calculating triangle's area. However, this is wrong: due to floating-point imprecision this formula returns non-zero area for some triangles with collinear vertices. A much safer way to calculate triangle's area is via cross-product of vectors representing its sides (see recipe "Using Vector Operations" for details). Finally, since coordinates of vertices are integer, the area will be an integer divided by two. To avoid comparing floating-point areas, we can store and compare twice-areas and divide the maximal of them by 2 only when returning it.

Aircraft (floating-point imprecision + integer overflow)

This is a nice example of a problem in which both types of error can occur. See recipe "Working with Lines and Segments" for analyses and solution.

It is often helpful to do as many of your computations as possible in integers, switching to floating-point numbers only when it's really necessary. Thus, if you compare Euclidean distances between integer points, you can work with squared distances (which are integers) and extract square root only at the very end. Similarly, in PerforatedSheet you should calculate both total mass and weighted sum of rectangle center positions (multiplied by 2 again) in 64-bit integer, and postpone the division to the last step.

See Also
Recipe "Working with Numbers" for details on representation of integer and floating-point numbers.
Recipe "Using Rational Arithmetic to Avoid Floating Point Imprecision" for another way to avoid floating-point imprecision.
Re: 1.3. Choosing Correct Numeric Data Types (response to post by Nickolas) | Reply
Some thoughts about numeric types are posted here.
Re: 1.3. Choosing Correct Numeric Data Types (response to post by syg96) | Reply
Borrowed your comment about using proper primitive data types, thanks. We'll probably remove this tip and the one about integer division from that recipe, since they are covered in this one.
RSS