JOIN
 Select a Forum     Round Tables New Member Discussions News Discussions Algorithm Matches Marathon Matches NASA Tournament Lab TopCoder Cookbook High School Matches Sponsor Discussions Development Forums Design Forums Search | Watch Thread  |  My Post History  |  My Watches  |  User Settings View: Flat (newest first)  | Threaded  | Tree Previous Thread  |  Next Thread Forums TopCoder Cookbook Algorithm Competitions - New Recipes Representing sets with bitfields
 Representing sets with bitfields | Reply Name: Representing sets with bitfieldsProblem: There are many ways to represent a subset of a set, but most of them are slow. This recipe shows how subsets from a domain of up to 64 elements can be very efficiently represented and manipulated as integers.Solution: Consider a subset of the set {0, 1, ..., N - 1}. This can be represented by an N-bit binary number, where bit i (representing 2i) is 1 if element i is present, and 0 if element i is absent. Most modern languages support integers of up to 64 bits, allowing subsets of 64-element sets to be encoded (some languages also allow for arbitrary numbers of bits). In code examples, it is assumed that a 32-bit integer is being used.Many set operations can be achieved by bitwise and integer arithmetic:Set union: ```A | B ```Set intersection: ```A & B ```Set subtraction: ```A & ~B ```Test if set is non-empty: ```A != 0 ```Test whether element x is in A: ```(A & (1 << x)) != 0 ```The set containing {0, ..., x - 1}: ```(1 << x) - 1 ```Put x in A (works even if x is already in A): ```A |= 1 << x; ```Remove x from A (works even if x is not in A): ```A &= ~(1 << x); ```Set negation: ```((1 << N) - 1) & ~A ```Test whether A has exactly one element: ```(A > 0) && (A & (A - 1)) == 0 ```Remove the smallest element from A: ```A = A & (A - 1) ```Remove all but the smallest element from A: ```A = A & ~(A - 1) ```Count the number of elements in A: ```// GCC: count = __builtin_popcount(A); // Java: count = Integer.bitCount(A); // Portable: count = (A & 0x55555555) + ((A >> 1) & 0x55555555); count = (count & 0x33333333) + ((count >> 2) & 0x33333333); count = (count & 0x0F0F0F0F) + ((count >> 4) & 0x0F0F0F0F); count = (count & 0x00FF00FF) + ((count >> 8) & 0x00FF00FF); count = (count & 0x0000FFFF) + (count >> 16); ```Get the index of the smallest element in A: ```// GCC: low = __builtin_ctz(A); // Java: low = Integer.numberOfTrailingZeros(A); // Portable: low = 0; lowbit = A & ~(A - 1); if ((tmp & 0xAAAAAAAA) != 0) low += 1; if ((tmp & 0xCCCCCCCC) != 0) low += 2; if ((tmp & 0xF0F0F0F0) != 0) low += 4; if ((tmp & 0xFF00FF00) != 0) low += 8; if ((tmp & 0xFFFF0000) != 0) low += 16; ```Discussion:Most of the optimizations that go into TopCoder contests are high-level; that is, they affect the algorithm rather than the implementation. However, one of the most useful and effective low-level optimizations is using the bits of an integer to represent a set. Not only does it produce an order-of-magnitude improvement in both speed and size, it can often simplify code at the same time.Most of the set operations are straight-forward, but the operations on the smallest element are more interesting. Suppose we wish to find the lowest set bit of A (which is known to be non-zero). If we subtract 1 from A then this bit is cleared, but all the other one bits in A remain set. Thus, A & ~(A - 1) consists of only the lowest set bit of A. You might sometimes see this written as A & -A, because when using twos-complement arithmetic they're equivalent.If we want the index of the highest or lowest bit (rather than the corresponding set representation), the obvious approach is simply to loop through the bits (upwards or downwards) until we find one that is set. At first glance this seems slow, since it does not take advantage of the bit-packing at all. However, if all 2N subsets of the N-element domain are equally likely, then the loop will take only two iterations on average, and this is actually the fastest method.The 386 introduced CPU instructions for bit scanning: BSF (bit scan forward) and BSR (bit scan reverse). GCC exposes these instructions through the built-in functions __builtin_ctz (count trailing zeros) and __builtin_clz (count leading zeros). These are the most convenient way to find bit indices for C++ programmers in TopCoder. Be warned though: the return value is undefined for an argument of zero.One can easily check if a number is a power of 2: clear the lowest 1 bit (see above) and check if the result is 0. However, sometimes it is necessary to know how many bits are set, and this is more difficult. GCC has a function called __builtin_popcount which does precisely this. However, unlike __builtin_ctz, it does not translate into a hardware instruction (at least on x86). Instead, it uses a table-based method, but it is nevertheless quite efficient and also extremely convenient. The language-neutral alternative listed above is not as efficient in practice, but quite elegant: in the first step it replaces each pair of a bits with a 2-bit value indicating the count of those bits; in the second step it replaces each 4-bit group with the count for that group; and so on, until the entire 32-bit value is replaced by a count of bits.There are a few mistakes that are very easy to make when performing bit manipulations. Watch out for them in your code:When executing a shift instruction a << b, the x86 architecture uses only the bottom 5 bits of b (6 for 64-bit integers). This means that shifting left (or right) by 32 does nothing, rather than clearing all the bits. This behavior is also specified by the Java and C# language standards; C++ says that shifting by at least the size of the value gives an undefined result.A related point to the above is to be careful to use the appropriate type for constants to avoid overflow. For example, if you're using a Java long to represent a set, then you should write 1L << x rather than 1 << x.In most languages, the & and | operators have lower precedence than comparison operators. That means that x & 3 == 1 is interpreted as x & (3 == 1), which is probably not what you want.Where possible, use unsigned values, particularly if you want to use the top bit. C++ leaves the effect of right-shifting a negative value undefined, although in the environment used by TopCoder it has the same behavior as in Java and C# (namely, replicating the sign bit). Java only has signed types, but the Java-specific operator >>> will shift in zeros rather than replicating the sign bit.The code examples above are for 32-bit integers. In most cases, the only change that needs to be made for 64-bit integers is to ensure that constants are forced to be the appropriate size (e.g., by writing 1 as 1L in Java or 1ULL in C++). The GCC builtin functions also require a ll suffix when using 64-bit integers (to indicate that they take a long long). For the portable alternatives to builtin functions, the patterns need to be extended to an extra iteration and 64-bit constants.(continued in reply - it exceeds the posting size limit)
 Re: Representing sets with bitfields (response to post by bmerry) | Reply (part 2, continued from first post)A simple example of these techniques is ImportsList. The basic observation required is that if A needs access to B, and B needs access to C, then there is no need for A to import C since it will automatically get access via B. Thus, the modules that A must import are those it needs access to, minus those it will get access to through its dependencies.```public class ImportsList { public int[] importsCount(String[] requires) { int N = requires.length; long[] reqMasks = new long[N]; for (int i = 0; i < N; i++) for (int j = 0; j < N; j++) if (requires[i].charAt(j) == 'Y') reqMasks[i] |= 1 << j;   int[] ans = new int[N]; for (int i = 0; i < N; i++) { long impMask = reqMasks[i]; for (int j = 0; j < N; j++) if ((reqMasks[i] & (1 << j)) != 0) impMask &= ~reqMasks[j]; ans[i] = Long.bitCount(impMask); } return ans; } } ``````#include #include   using namespace std;   class ImportsList { public: vector importsCount(vector requires) { int N = requires.size(); vector reqMasks(N, 0); for (int i = 0; i < N; i++) for (int j = 0; j < N; j++) if (requires[i][j] == 'Y') reqMasks[i] |= 1LL << j;   vector ans(N); for (int i = 0; i < N; i++) { long long impMask = reqMasks[i]; for (int j = 0; j < N; j++) if ((reqMasks[i] & (1LL << j)) != 0) impMask &= ~reqMasks[j]; ans[i] = __builtin_popcountll(impMask); } return ans; } }; ```Another advantage of encoding a set into an integer is that it becomes trivial to index an array with a set. TheCardLineDivTwo has a fairly common theme of counting the possible way to order some objects. We can solve it recursively, by keeping track of which cards are still available and which card was used last. Since the result of the recurse function depends only on its parameters (and the cards, which are fixed for one test case), we can cache the result of each call; and since the set of available cards is encoded as an integer, we can use the value to directly index a cache. Here we use the exclusive-or (^) operator to toggle an element in the set (in this case we know whether the element was previously present, so it is not the only way to do it).```import java.util.*;   public class TheCardLineDivTwo { private static final int MOD = 1234567891;   private String[] cards; private int[][] cache;   private boolean red(int card) { char suit = cards[card].charAt(1); return suit == 'H' || suit == 'D'; }   /** Determine whether card1 and card2 can be adjacent in line */ private boolean compatible(int card1, int card2) { char rank1 = cards[card1].charAt(0); char rank2 = cards[card2].charAt(0); return rank1 == rank2 || red(card1) == red(card2); }   /** * Count the number of ways to finish off the line using the cards in * the set valid, given that the previously placed card was last. */ private int recurse(int last, int valid) { if (cache[last][valid] != -1) { // Already solved this subproblem, use the cached value return cache[last][valid]; } if (valid == 0) { // All cards placed, no further choices to make. return cache[last][valid] = 1; } long ans = 0; // Try all possibilities for the last card for (int i = 0; i < cards.length; i++) if ((valid & (1 << i)) != 0 && compatible(last, i)) { ans = (ans + recurse(i, valid ^ (1 << i))) % MOD; } return cache[last][valid] = (int) ans; }   public int count(String[] cards) { this.cards = cards; int N = cards.length; cache = new int[N][1 << N]; for (int i = 0; i < N; i++) for (int j = 0; j < (1 << N); j++) cache[i][j] = -1; // mark as unknown   long ans = 0; int all = (1 << N) - 1; for (int i = 0; i < N; i++) ans = (ans + recurse(i, all ^ (1 << i))) % MOD; return (int) ans; } } ``````#include #include #include   using namespace std;   class TheCardLineDivTwo { static const int MOD = 1234567891;   vector cards; vector > cache;   bool red(int card) { char suit = cards[card][1]; return suit == 'H' || suit == 'D'; }   /** Determine whether card1 and card2 can be adjacent in line */ bool compatible(int card1, int card2) { char rank1 = cards[card1][0]; char rank2 = cards[card2][0]; return rank1 == rank2 || red(card1) == red(card2); }   /** * Count the number of ways to finish off the line using the cards in * the set valid, given that the previously placed card was last. */ int recurse(int last, int valid) { if (cache[last][valid] != -1) { // Already solved this subproblem, use the cached value return cache[last][valid]; } if (valid == 0) { // All cards placed, no further choices to make. return cache[last][valid] = 1; } long long ans = 0; // Try all possibilities for the last card for (size_t i = 0; i < cards.size(); i++) if ((valid & (1 << i)) != 0 && compatible(last, i)) { ans = (ans + recurse(i, valid ^ (1 << i))) % MOD; } return cache[last][valid] = (int) ans; }   public: int count(vector cards) { this->cards = cards; int N = cards.size(); cache = vector >(N, vector(1 << N, -1));   long long ans = 0; int all = (1 << N) - 1; for (int i = 0; i < N; i++) ans = (ans + recurse(i, all ^ (1 << i))) % MOD; return (int) ans; } }; ```
 Re: Representing sets with bitfields (response to post by bmerry) | Reply Once again, great recipe in correct format. Are there built-in bit-counting functions for C#/VB.NET/Python? I can't find any.
 Re: Representing sets with bitfields (response to post by Nickolas) | Reply There aren't methods C#/VB.NET. :( Since they use MSIL as an intermediate language before translating to assembly, it's up to the JIT compiler to try to resolve the *best* method it can find. I've had to do the bit manipulation by hand whenever I want to use any of that functionality. Not sure about Python.
 Re: Representing sets with bitfields (response to post by bmerry) | Reply For "Count the number of elements in A":The last linecount = (count & 0x0000FFFF) + (count >> 16);should becount = (count & 0x0000FFFF) + ((count >> 16) & 0x0000FFFF);because if count is int, then the sign bit will be copied, i.e.-1 >> 16 == -1. You won't need to change if you assume either:1) Input is unsigned int (but then this is not portable).2) Input is between 0 and 0x7FFFFFFF (but then this is 31 bit).
 Re: Representing sets with bitfields (response to post by Sparrow) | Reply Ah yes, pesky Java with its signed-only numbers. It would probably also work if you started with a 32-bit value in a Java long, but for simplicity I agree it should just change to the line you gave.
 Re: Representing sets with bitfields (response to post by Nickolas) | Reply For python:http://wiki.python.org/moin/BitwiseOperatorsintegers have the following methods also:```>>> for i in dir(x): ... print i ... __abs__ __add__ __and__ __class__ __cmp__ __coerce__ __delattr__ __div__ __divmod__ __doc__ __float__ __floordiv__ __format__ __getattribute__ __getnewargs__ __hash__ __hex__ __index__ __init__ __int__ __invert__ __long__ __lshift__ __mod__ __mul__ __neg__ __new__ __nonzero__ __oct__ __or__ __pos__ __pow__ __radd__ __rand__ __rdiv__ __rdivmod__ __reduce__ __reduce_ex__ __repr__ __rfloordiv__ __rlshift__ __rmod__ __rmul__ __ror__ __rpow__ __rrshift__ __rshift__ __rsub__ __rtruediv__ __rxor__ __setattr__ __sizeof__ __str__ __sub__ __subclasshook__ __truediv__ __trunc__ __xor__ bit_length conjugate denominator imag numerator real >>> ```bit_length is fairly useful.
 Re: Representing sets with bitfields (response to post by bmerry) | Reply Sir, Please do comment the codes, it will be lot easier for us to understand.
 Re: Representing sets with bitfields (response to post by bmerry) | Reply xdsdsa
 Re: Representing sets with bitfields (response to post by bmerry) | Reply The first post is showing as empty for me, even in the edit history. I'm using the old TC skin... is anyone seeing the same thing?
 Re: Representing sets with bitfields (response to post by d000hg) | Reply I am using the old TC skin, but I can still see the first post...
 Forums TopCoder Cookbook Algorithm Competitions - New Recipes Representing sets with bitfields Previous Thread  |  Next Thread