



Name: Representing sets with bitfields
Problem: There are many ways to represent a subset of a set, but most of them are slow. This recipe shows how subsets from a domain of up to 64 elements can be very efficiently represented and manipulated as integers.
Solution: Consider a subset of the set {0, 1, ..., N  1}. This can be represented by an Nbit binary number, where bit i (representing 2^{i}) is 1 if element i is present, and 0 if element i is absent. Most modern languages support integers of up to 64 bits, allowing subsets of 64element sets to be encoded (some languages also allow for arbitrary numbers of bits). In code examples, it is assumed that a 32bit integer is being used.
Many set operations can be achieved by bitwise and integer arithmetic:
Set union: A  B
Set intersection: A & B
Set subtraction: A & ~B
Test if set is nonempty: A != 0
Test whether element x is in A: (A & (1 << x)) != 0
The set containing {0, ..., x  1}: (1 << x)  1
Put x in A (works even if x is already in A): A = 1 << x;
Remove x from A (works even if x is not in A): A &= ~(1 << x);
Set negation: ((1 << N)  1) & ~A
Test whether A has exactly one element: (A > 0) && (A & (A  1)) == 0
Remove the smallest element from A: A = A & (A  1)
Remove all but the smallest element from A: A = A & ~(A  1)
Count the number of elements in A: // GCC:
count = __builtin_popcount(A);
// Java:
count = Integer.bitCount(A);
// Portable:
count = (A & 0x55555555) + ((A >> 1) & 0x55555555);
count = (count & 0x33333333) + ((count >> 2) & 0x33333333);
count = (count & 0x0F0F0F0F) + ((count >> 4) & 0x0F0F0F0F);
count = (count & 0x00FF00FF) + ((count >> 8) & 0x00FF00FF);
count = (count & 0x0000FFFF) + (count >> 16);
Get the index of the smallest element in A: // GCC:
low = __builtin_ctz(A);
// Java:
low = Integer.numberOfTrailingZeros(A);
// Portable:
low = 0;
lowbit = A & ~(A  1);
if ((tmp & 0xAAAAAAAA) != 0) low += 1;
if ((tmp & 0xCCCCCCCC) != 0) low += 2;
if ((tmp & 0xF0F0F0F0) != 0) low += 4;
if ((tmp & 0xFF00FF00) != 0) low += 8;
if ((tmp & 0xFFFF0000) != 0) low += 16;
Discussion:
Most of the optimizations that go into TopCoder contests are highlevel; that is, they affect the algorithm rather than the implementation. However, one of the most useful and effective lowlevel optimizations is using the bits of an integer to represent a set. Not only does it produce an orderofmagnitude improvement in both speed and size, it can often simplify code at the same time.
Most of the set operations are straightforward, but the operations on the smallest element are more interesting. Suppose we wish to find the lowest set bit of A (which is known to be nonzero). If we subtract 1 from A then this bit is cleared, but all the other one bits in A remain set. Thus, A & ~(A  1) consists of only the lowest set bit of A. You might sometimes see this written as A & A, because when using twoscomplement arithmetic they're equivalent.
If we want the index of the highest or lowest bit (rather than the corresponding set representation), the obvious approach is simply to loop through the bits (upwards or downwards) until we find one that is set. At first glance this seems slow, since it does not take advantage of the bitpacking at all. However, if all 2^{N} subsets of the Nelement domain are equally likely, then the loop will take only two iterations on average, and this is actually the fastest method.
The 386 introduced CPU instructions for bit scanning: BSF (bit scan forward) and BSR (bit scan reverse). GCC exposes these instructions through the builtin functions __builtin_ctz (count trailing zeros) and __builtin_clz (count leading zeros). These are the most convenient way to find bit indices for C++ programmers in TopCoder. Be warned though: the return value is undefined for an argument of zero.
One can easily check if a number is a power of 2: clear the lowest 1 bit (see above) and check if the result is 0. However, sometimes it is necessary to know how many bits are set, and this is more difficult. GCC has a function called __builtin_popcount which does precisely this. However, unlike __builtin_ctz, it does not translate into a hardware instruction (at least on x86). Instead, it uses a tablebased method, but it is nevertheless quite efficient and also extremely convenient. The languageneutral alternative listed above is not as efficient in practice, but quite elegant: in the first step it replaces each pair of a bits with a 2bit value indicating the count of those bits; in the second step it replaces each 4bit group with the count for that group; and so on, until the entire 32bit value is replaced by a count of bits.
There are a few mistakes that are very easy to make when performing bit manipulations. Watch out for them in your code:
 When executing a shift instruction a << b, the x86 architecture uses only the bottom 5 bits of b (6 for 64bit integers). This means that shifting left (or right) by 32 does nothing, rather than clearing all the bits. This behavior is also specified by the Java and C# language standards; C++ says that shifting by at least the size of the value gives an undefined result.
 A related point to the above is to be careful to use the appropriate type for constants to avoid overflow. For example, if you're using a Java long to represent a set, then you should write 1L << x rather than 1 << x.
 In most languages, the & and  operators have lower precedence than comparison operators. That means that x & 3 == 1 is interpreted as x & (3 == 1), which is probably not what you want.
 Where possible, use unsigned values, particularly if you want to use the top bit. C++ leaves the effect of rightshifting a negative value undefined, although in the environment used by TopCoder it has the same behavior as in Java and C# (namely, replicating the sign bit). Java only has signed types, but the Javaspecific operator >>> will shift in zeros rather than replicating the sign bit.
The code examples above are for 32bit integers. In most cases, the only change that needs to be made for 64bit integers is to ensure that constants are forced to be the appropriate size (e.g., by writing 1 as 1L in Java or 1ULL in C++). The GCC builtin functions also require a ll suffix when using 64bit integers (to indicate that they take a long long). For the portable alternatives to builtin functions, the patterns need to be extended to an extra iteration and 64bit constants.
(continued in reply  it exceeds the posting size limit) 

(part 2, continued from first post)
A simple example of these techniques is ImportsList. The basic observation required is that if A needs access to B, and B needs access to C, then there is no need for A to import C since it will automatically get access via B. Thus, the modules that A must import are those it needs access to, minus those it will get access to through its dependencies.
public class ImportsList
{
public int[] importsCount(String[] requires)
{
int N = requires.length;
long[] reqMasks = new long[N];
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
if (requires[i].charAt(j) == 'Y')
reqMasks[i] = 1 << j;
int[] ans = new int[N];
for (int i = 0; i < N; i++)
{
long impMask = reqMasks[i];
for (int j = 0; j < N; j++)
if ((reqMasks[i] & (1 << j)) != 0)
impMask &= ~reqMasks[j];
ans[i] = Long.bitCount(impMask);
}
return ans;
}
}
#include <vector>
#include <string>
using namespace std;
class ImportsList
{
public:
vector<int> importsCount(vector<string> requires)
{
int N = requires.size();
vector<long long> reqMasks(N, 0);
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
if (requires[i][j] == 'Y')
reqMasks[i] = 1LL << j;
vector<int> ans(N);
for (int i = 0; i < N; i++)
{
long long impMask = reqMasks[i];
for (int j = 0; j < N; j++)
if ((reqMasks[i] & (1LL << j)) != 0)
impMask &= ~reqMasks[j];
ans[i] = __builtin_popcountll(impMask);
}
return ans;
}
};
Another advantage of encoding a set into an integer is that it becomes trivial to index an array with a set. TheCardLineDivTwo has a fairly common theme of counting the possible way to order some objects. We can solve it recursively, by keeping track of which cards are still available and which card was used last. Since the result of the recurse function depends only on its parameters (and the cards, which are fixed for one test case), we can cache the result of each call; and since the set of available cards is encoded as an integer, we can use the value to directly index a cache. Here we use the exclusiveor (^) operator to toggle an element in the set (in this case we know whether the element was previously present, so it is not the only way to do it).
import java.util.*;
public class TheCardLineDivTwo
{
private static final int MOD = 1234567891;
private String[] cards;
private int[][] cache;
private boolean red(int card)
{
char suit = cards[card].charAt(1);
return suit == 'H'  suit == 'D';
}
/** Determine whether card1 and card2 can be adjacent in line */
private boolean compatible(int card1, int card2)
{
char rank1 = cards[card1].charAt(0);
char rank2 = cards[card2].charAt(0);
return rank1 == rank2  red(card1) == red(card2);
}
/**
* Count the number of ways to finish off the line using the cards in
* the set valid, given that the previously placed card was last.
*/
private int recurse(int last, int valid)
{
if (cache[last][valid] != 1)
{
// Already solved this subproblem, use the cached value
return cache[last][valid];
}
if (valid == 0)
{
// All cards placed, no further choices to make.
return cache[last][valid] = 1;
}
long ans = 0;
// Try all possibilities for the last card
for (int i = 0; i < cards.length; i++)
if ((valid & (1 << i)) != 0
&& compatible(last, i))
{
ans = (ans + recurse(i, valid ^ (1 << i))) % MOD;
}
return cache[last][valid] = (int) ans;
}
public int count(String[] cards)
{
this.cards = cards;
int N = cards.length;
cache = new int[N][1 << N];
for (int i = 0; i < N; i++)
for (int j = 0; j < (1 << N); j++)
cache[i][j] = 1; // mark as unknown
long ans = 0;
int all = (1 << N)  1;
for (int i = 0; i < N; i++)
ans = (ans + recurse(i, all ^ (1 << i))) % MOD;
return (int) ans;
}
}
#include <vector>
#include <string>
#include <cstddef>
using namespace std;
class TheCardLineDivTwo
{
static const int MOD = 1234567891;
vector<string> cards;
vector<vector<int> > cache;
bool red(int card)
{
char suit = cards[card][1];
return suit == 'H'  suit == 'D';
}
/** Determine whether card1 and card2 can be adjacent in line */
bool compatible(int card1, int card2)
{
char rank1 = cards[card1][0];
char rank2 = cards[card2][0];
return rank1 == rank2  red(card1) == red(card2);
}
/**
* Count the number of ways to finish off the line using the cards in
* the set valid, given that the previously placed card was last.
*/
int recurse(int last, int valid)
{
if (cache[last][valid] != 1)
{
// Already solved this subproblem, use the cached value
return cache[last][valid];
}
if (valid == 0)
{
// All cards placed, no further choices to make.
return cache[last][valid] = 1;
}
long long ans = 0;
// Try all possibilities for the last card
for (size_t i = 0; i < cards.size(); i++)
if ((valid & (1 << i)) != 0
&& compatible(last, i))
{
ans = (ans + recurse(i, valid ^ (1 << i))) % MOD;
}
return cache[last][valid] = (int) ans;
}
public:
int count(vector<string> cards)
{
this>cards = cards;
int N = cards.size();
cache = vector<vector<int> >(N, vector<int>(1 << N, 1));
long long ans = 0;
int all = (1 << N)  1;
for (int i = 0; i < N; i++)
ans = (ans + recurse(i, all ^ (1 << i))) % MOD;
return (int) ans;
}
};


Once again, great recipe in correct format. Are there builtin bitcounting functions for C#/VB.NET/Python? I can't find any. 

There aren't methods C#/VB.NET. :( Since they use MSIL as an intermediate language before translating to assembly, it's up to the JIT compiler to try to resolve the *best* method it can find. I've had to do the bit manipulation by hand whenever I want to use any of that functionality. Not sure about Python. 

For "Count the number of elements in A": The last line
count = (count & 0x0000FFFF) + (count >> 16);
should be
count = (count & 0x0000FFFF) + ((count >> 16) & 0x0000FFFF);
because if count is int, then the sign bit will be copied, i.e. 1 >> 16 == 1. You won't need to change if you assume either:
1) Input is unsigned int (but then this is not portable). 2) Input is between 0 and 0x7FFFFFFF (but then this is 31 bit). 

Ah yes, pesky Java with its signedonly numbers. It would probably also work if you started with a 32bit value in a Java long, but for simplicity I agree it should just change to the line you gave. 

For python:
http://wiki.python.org/moin/BitwiseOperators
integers have the following methods also:
>>> for i in dir(x):
... print i
...
__abs__
__add__
__and__
__class__
__cmp__
__coerce__
__delattr__
__div__
__divmod__
__doc__
__float__
__floordiv__
__format__
__getattribute__
__getnewargs__
__hash__
__hex__
__index__
__init__
__int__
__invert__
__long__
__lshift__
__mod__
__mul__
__neg__
__new__
__nonzero__
__oct__
__or__
__pos__
__pow__
__radd__
__rand__
__rdiv__
__rdivmod__
__reduce__
__reduce_ex__
__repr__
__rfloordiv__
__rlshift__
__rmod__
__rmul__
__ror__
__rpow__
__rrshift__
__rshift__
__rsub__
__rtruediv__
__rxor__
__setattr__
__sizeof__
__str__
__sub__
__subclasshook__
__truediv__
__trunc__
__xor__
bit_length
conjugate
denominator
imag
numerator
real
>>>
bit_length is fairly useful. 

Sir, Please do comment the codes, it will be lot easier for us to understand. 

The first post is showing as empty for me, even in the edit history. I'm using the old TC skin... is anyone seeing the same thing? 

I am using the old TC skin, but I can still see the first post... 
