||My approach had 3 parts:
1. count colors in c-1 guesses (ideally this would be nearly skipped)
2. find a totally wrong guess
3. keep a set of all possible combinations of a subset of each position color pair. generate guesses isolated to the subset (simulate to find best avg guess) and filter out the ones which are not consistant. if this set gets below a certain size add another position color pair subset.
||What I've managed to intuit is that, more often than not, NP complete problems tend to have solutions which can be completely different if the inputs are modified by a constant number of 'elements' (like a constant size graph attachment, or something). This is the case in Minimal Vertex Covers, the Knapsack Problem, Integer programming, K-coloring a graph, the traveling salesman problem, probably most of the others as well. I don't think that the converse is true.
Also, it's not true that you need to iterate over the entire, or most of the state space for NP complete problems: there are some pretty interesting new results which drastically reduce the number of states you need to examine in integer programming, not to mention the classes branch and bound. It's pretty cool.
||My approach was very similar to gsais's.
As a first step, I used up to (K-1) guesses to find the color counts, but at every step, I used the previous information and random swaps to extract more information (so if the first guess 00..00 shows that there are 2 0s in the code, my second guess was a random permutation of 2 0s and L-2 1s, and so on).
In the second stage, I generated a few random permutations of the correct colors, and using single swaps, I tried to get a code, which is as consistent with the previous answers as possible. My consistency metric was the same as gsais's, Sum(Abs(Expected matched - Actual matches)).
This algorithm works the best, if at each step a large number of random permutations are examined, so after TheFaxman told that inline assembly is allowed I rewrote my optimize-by-swaps code to use SSE2, giving it about six-sevenfold speed increase. So my final code examines 3.7e10 / (L^4 * log2(K)^2) combinations at each call to nextGuess(), that ensures that runtime will remain around 12-14 seconds even in worst case (according to local test, that means a score of 1.6).
||Several coders have mentioned random swaps as the way to find compatible guess.
My code does a bit more. It repeatedly removes 2-8 worst pegs, then adds them back in the best places. In each step if there are several possibilities with the same change in error, I pick one randomly.
Similar algorithm was used in the first stage when I don't know color counts yet for finding compatible set of counts.
||I have a totally different approach which works pretty well. It goes like this:
Find the right combination of colors in k-1 guesses using (0,0,0,0...) (1,1,1,1,...) etc.
Create some arbitrary guess using these combinations.
For each step, swap only 2 spots that have different numbered pegs. There are 5 cases possible for the resulting score change:
Let us say we previously guessed X in position A and Y in position B (X,A) and (Y,B), and the swap creates a new guess with (X,B) and (Y,A) with everything else remaining constant.
Score +2: (X,B) and (Y,A) are correct configurations.
Score -2: (X,A) and (Y,B) are correct configurations.
Score +1: The two pegs are contradicting in their new configuation - ie. if (X,B) is correct, (Y,A) is incorrect and vice versa. We note this by creating "contradiction lists" for each configuration in which we can look up the list of all the conflicting configurations that are connected to any configuration.
Score -1: The two pegs are contradicting in their old configuation - ie. if (X,A) is correct, (Y,B) is incorrect and vice versa. These are also added to the conflicting configuration list.
Score unchanged: (X,A) and (Y,B) are correlated, and (X,B) and (Y,A) are corellated - ie. if (XA) is correct then (Y,B) is correct and if (XA) is wrong then (Y,B) is wrong. The same is true with (X,B) and (Y,A). This is not entirely intuitive, but it is true. We create "correlation lists" for each configuration in the same way as above.
Now, when you do a swap, and you get a result of +-2, then you can call those configurations correct. In addition, you can go through the list of contradictions and correlations for those configurations and mark them thusly. And for each of those, you can mark the configurations in their lists, and so on. We keep a set of "correct" and "wrong" configurations, not swapping any correct configurations and not swapping into two wrong configurations. For the +-1 and unchanged score cases, we can check against our list of known wrong configurations and imply results in many cases.
Now, my submission had a tiny-but-important bug in it, as I did not prevent swapping into two wrong configurations (not a very helpful swap!), so my score sucks. It was 24.3. But with that error fixed, I went from never getting the k=20, l=100 case to getting it in an average of 1400 or so guesses with randomly seeded tests. Also, speed is not even relatively an issue as running of this algorithm with the largest cases takes about 1 second or so. I figure this is a decent overall result but I have no clue what my score would have been. The whole scoring thing for these matches is still a bit obtuse to me...
Determine the colors present in the code.
If all the possible colors were in the code, start with Phase 2a. Otherwise, jump to Phase 2b.
This phase is a variant of a simple brute force search where you change each peg one at a time until results increases by one, then you move to the next peg. Instead of doing things 1 peg at a time, I did them 2 at a time. A little trickier than 1 peg at a time since there were some special cases that had to be handled, but still manageable.
After you've found the color for a peg, update the number of pegs left in the code of that color. Once you find a color that doesn't occur in the rest of the code, move to Phase 2b.
Similar to what other people did with guessing codes that are consistent with what has been guessed so far. Rather than guessing the whole code, I guess blocks at a time, starting at the leftmost part of the code that hasn't been determined yet. Once a "block" has been determined, move on to the next block. When randomly guessing a code, I would randomly pick results pegs to keep from the last "best" guess, and then guess the remaining pegs using the probability distribution of the colors of the remaining pegs. (The last "best" guess is the one which I guessed that had the largest value of results.)
||I was new to Topcoder and such kind of problems, hence I probably did not make much use of the 20 seconds time alloted.
I tried to find a regular efficient simple direct solution, and and this resulted in generation of guesses with hardly any computation in almost no time.
Here's the thing I did:
Stage 1) Use k-1 unicolor guesses to find the number of pegs of each color. I tried a lot to think of other ways instead of doing this way, but I couldnt find a way to make them simple enough to be coded!
Stage 2) Finding Background color: If there is any color with no pegs, skip this step, and and use that as the background color. Otherwise, do a linear search on the rarest color to find where it's pegs are located, and then it can serve as the background color for the remaining spots.
Stage 3) Now that we have a background color, we could replace any pegs with this color, and the reduction in the number of exact matches would correspond directly to the correct pegs we have displaced. So using this, I did a binary search for each color's pegs -- basically replacing the pegs I was interested in by the background color, and thus finding how many pegs of a color are located there, and then changing the region of interest in a binary search fashion. Repeating this for each color, I was able to find the actual code.
Using this approach I scored around 65. Then I made an improvement in stage 2, by using a modified binary search to find the locations of the rarest color thereby reducing a possible whole pass of k guessses.
This got me to score ~75.
I think this is a decent method --- if the time taken to generate the guesses is also taken into account, this would compare fairly well I feel.
However, I felt this approach was not doing good l is approximately same or lower than k.
||It looks like skipping stage 1 is what allowed me to take the lead with large margin.
I've added this stage to my final submission, and the score dropped to around 113.
||Yes, I guess when K >= L, "wasting" K-1 guesses for color frequencies is not a good idea.
||Let len=l (size of array)
1) Perform (k-1) guesses to compute freq of colors.
2) Set the whole array elements to the most frequent color.
3) Perform (len) operations, each one is only changing an elem to the next most frequent color. This will determine exactly all elems that belongs to one of the two most frequent colors.
4) During step 3, get the number of elems with most frequent color in the first half, and using this info, perform (k-2) guesses to determine the frequency of each color in each half.
5) Iterate over all elements in the first half, changing two elems at one time to the most frequent color in their candidate list. By The most frequent color, I mean the color that has the largest frequency in the un-solved positions of the first half.
6) Make the same for second half.
||I did not get the method of guessing based on previous guesses. Can you please explain what is:
Sum(Abs(Expected matched - Actual matches)), and how the new guess is selected?
||This is provably true! With K-1 guess it is almost always possible
to split the code in two equally sized parts and know the color distribution in both parts. That's how my solution basically works.
It continues to split each part in two until each part has less than
about 1Million permutions and bute forces these. Additionally there
are a few special cases for small lenghts. This gave me ~110 points so
it can't be too bad.