JOIN
Get Time
forums   
Search | Watch Thread  |  My Post History  |  My Watches  |  User Settings
View: Flat (newest first)  | Threaded  | Tree
Previous Thread  |  Next Thread
[ 1 2 ]    NEXT >
Checking if a number is a power of two | Reply
One can easily check if a number is a power of 2: clear the lowest 1 bit (see above) and check if the result is 0.
There is a trivial optimization to this - you can do it as (x & (x-1)) instead of (x & ~(x & ~(x-1))) (you can prove it by applying DeMorgan's laws, the law of distributivity, and throwing away the (x & ~x) term; you can also simply observe that it works for the same reason the x & ~(x-1) returns the lowest bit set ;-).

I remember reading about that trick in a magazine article more than 10 years ago. I also think I saw that same trick used in one of tomek's submissions a couple of years back, but I cannot remember which one it was.

Thanks for the great article!
Re: Checking if a number is a power of two (response to post by kyky) | Reply
In two's compliment, doesn't x & -x work?
Re: Checking if a number is a power of two (response to post by kyky) | Reply
I also want to add that there is actually a pretty O(log N) solution for the problem of counting the number of bits set (N stands for the number of bits in number).
It works like following (lets assume we have 4-bit number A)
B = A;
B = (B & 0101b) + ((B & 1010b) >> 1)
B = (B & 0011b) + ((B & 1100b) >> 2)
So B will be number of the bits that are set to 1 in A.

There is really a plenty of such kinda tricks in book called "Hacker's Delight" or something like that :)

Anyway the article is pretty.
Re: Checking if a number is a power of two (response to post by real_vg) | Reply
Yes, I've used a variation of that bit counting method before when time was critical and bit counting was happening in critical loops.
Re: Checking if a number is a power of two (response to post by real_vg) | Reply
It depends on what you count as an operation.. seems like you wuold have to at least *read in* the number, and that's O(N), where N is the number of bits in a number..
Re: Checking if a number is a power of two (response to post by Larry) | Reply
I'm not sure what you mean, but I think the technique is more about parallelism - "reading in" the number might be O(n), but it's likely with a very low constant. For instance, if "reading in" means pushing a 32-bit value onto the stack, then it's probably a single operation, and passing n ints is passing n*32 bits, but the time taken is n (or n*32/32 ?). Counting bits on a 32-bit integer iteratively might take proportional to 32 operations, but using this parallel addition method, it might take a grand total of about 10 or 12 bitwise operations with no branching.
Re: Checking if a number is a power of two (response to post by real_vg) | Reply
wouldn't something like num[x & mask] + num[x >> half_num_bits] be faster?
Re: Checking if a number is a power of two (response to post by Cosmin.ro) | Reply
I agree. This is the fastest according to this

Also I think that method can be made faster if you use a union. Then you don't need to do (x >> half_num_bits)

bmerry thank you for the great article! In particular, I am very happy to find out about __builtin_ctz instruction.
Re: Checking if a number is a power of two (response to post by Cosmin.ro) | Reply
I am not sure what do you mean here, may you explain?
Re: Checking if a number is a power of two (response to post by real_vg) | Reply
I guess Cosmin.ro assumes the number of bits set in x is cached in num[x] (for low values of x).
Re: Checking if a number is a power of two (response to post by dimkadimon) | Reply
Well, it highly depends on architecture, as methods using precomputed tables are likely to perform slower than parallel computation on systems with slow memory access (which is not too unlikely thing). I am not sure, but it may happen that at least Precomp_16 will fail to preform on i686 architecture under real conditions. The matter is that the tests were artificial, if I got it well, i.e. the routine was repeatedly called in loop for some sample data, without performing any other operations, especially that ones that require memory access. As you need to store 64k of precomputed data, which is not accessed sequentially or so, it is very likely to cause cache lookup failures in real world (when doing some other things besides of just counting bits). It of course highly depends on processor internal cache size, but this size is for sure comparative with 64k. The process of resolving cache failure is not too quick, and so the Precomp_16 is likely to behave much worse in some situation.
The worst thing is actually that it has undefined execution time, i.e. the execution time depends on some random conditions, and we have unexpected behaviour. It is not too good.
Re: Checking if a number is a power of two (response to post by Kawigi) | Reply
Some replies to several people's comments:

In two's compliment, doesn't x & -x work?

Not quite. Negation in 2's complement is by inverting and adding 1. So it will tell you if x is one less than a power of 2.

I also want to add that there is actually a pretty O(log N) solution for the problem of counting the number of bits set (N stands for the number of bits in number).

This is the exercise I referred to under "Cute tricks". I did some experiments of my own and they're consistent with those at the link dimkadimon posted (thanks for the link). These sexy-looking routines seem to have a high constant factor and are more for showing off - particularly those that use a mod tend to get blown out of the water. GCC seems to use an 8-bit table; the link indicates that a 16-bit table is faster although depending on the application it might cause more L1 misses.

but using this parallel addition method, it might take a grand total of about 10 or 12 bitwise operations with no branching

It actually ends up being quite a lot of instructions (for 32-bit). You've got two masks, a shift and an addition per iteration, plus you have to have some copies because x86 instructions are in-place.
Re: Checking if a number is a power of two (response to post by real_vg) | Reply
Although I agree that Precomp_16 may not be optimal for all machines and all applications, I still think it has the best average running time. In other words, given a random machine and a random application you are probably best of using it. Memory access is usually faster (especially if its only 64K) than most other operations.
Re: Checking if a number is a power of two (response to post by bmerry) | Reply
Not quite. Negation in 2's complement is by inverting and adding 1.

Isn't adding 1 after inverting the same as subtracting one before inverting? Just to make sure, I wrote this program, which prints nothing:

public static void main(String[] args)
{
	int i=0;
	do
	{
		int lastbit1 = i & -i;
		int lastbit2 = i & ~(i-1);
		if (lastbit1 != lastbit2)
			System.out.println("Inconsistent for " + i);
		i++;
	}
	while (i != 0);
}


Edit: And about the number of operations for counting bits in parallel, you are right that I underestimated the number - a basic unoptimized version has 19 operators (and definitely some copying around underneath, but so does iteratively counting bits). It's definitely slower than a lookup table if it needs to be done frequently enough to even matter, but my point is that it's still faster than the iterative method (at least in the average case), even if you only count bits that are set and don't check bits that aren't. My feeble attempt at benchmarking (I call it feeble because the profiler is being absolutely unhelpful) indicates that this:
int bitcount = 0;
while (i != 0)
{
	bitcount++;
	i ^= i&-i;
}
return bitcount;

takes about 4 times longer than this:
bitcounts = (i&0x55555555) + ((i>>1)&0x55555555);
bitcounts = (bitcounts&0x33333333) + ((bitcounts>>2)&0x33333333);
bitcounts = (bitcounts&0x0F0F0F0F) + ((bitcounts>>4)&0x0F0F0F0F);
bitcounts = (bitcounts&0x00FF00FF)+((bitcounts>>8)&0x00FF00FF);
bitcounts = (bitcounts&0x0000FFFF)+(bitcounts>>16);
return bitcounts;

which takes about about 10% longer than this:
if (nums == null)
{
	nums = new int[256];
	for (int j=1; j<256; j++)
		nums[j] = nums[j^(j&-j)]+1;
}
return nums[i&255]+nums[(i>>8)&255]+nums[(i>>16)&255]+nums[(i>>24)&255];

which takes about 55% longer than a 16-bit version of the last function when called about 200 million times (hopefully with a minimal amount of overhead, especially since the functions seem to be getting inlined).

Edit again: This is, of course, if all you're doing is a ton of bit counting - aside from the cache space that might be hogged by a large bitcount cache in a more practical program, there's the issue that you might not be counting bits 200 million times in a normal, otherwise efficient program :-)
Re: Checking if a number is a power of two (response to post by dimkadimon) | Reply
"Memory access is usually faster (especially if its only 64K) than most other operations."

Are you kidding? Memory access is usually much slower then other operations except if you are keeping in mind division operations and more complex ones. Memory speed is much lower than CPU speed, it have been so for years. Also 64k is not really so small amount, as e.g. intel Celeron processors often were made with only 128k of L2 cache, and so 64k is a half of cache. If the bit counting is combined with some other memory hungry operation it will really introduce a slowdown because of cache misses (note that it can also increase the number of cache misses for the other part of code).

Though 8-bit precomputation probably will really outperform parallel computation in any modern x86 system, as its precomputation table is really small enough to fit into cache. I think that GCC uses 8-bit precomputation because it is likely to perform much more stable as it is not likely to produce many cache misses.
[ 1 2 ]    NEXT >

RSS