JOIN
 Select a Forum     Round Tables New Member Discussions News Discussions Algorithm Matches Marathon Matches NASA Tournament Lab TopCoder Cookbook High School Matches Sponsor Discussions Development Forums Design Forums Search | Watch Thread  |  My Post History  |  My Watches  |  User Settings View: Flat  | Threaded  | Tree Previous Thread  |  Next Thread Forums Marathon Matches Marathon Match 2 Post your approach
 Post your approach | Reply Hmmm, it appears that the new Intel Marathon has distracted people from the usual discussion of approaches. I'll get it started.I started with the relatively straightforward approach of first determining the number of times each color occurred in the code, and then working through to find the correct color for each position. Then I tried a few approaches to optimizing that basic idea.I also tried combining this approach with the "brute force" method for small subsets of the problem.I wanted to try a genetic algorithm approach, but I decided that it would be better to start with something that was guaranteed to take less than 3000 guesses. Unfortunately there were enough possibilities there that I never got back to trying a GA. Did anyone else try that?
 Re: Post your approach (response to post by MightyByte) | Reply Let C be the number of colors and P be number of pegs.Major idea:Phase 1:Find the number of occurrence of all colors with at most C-1 queries.Phase 2:Pick a reference color. I need to know all the location of occurence of this color. If there is a color with zero occurence (This happens with probability roughly 70%), I pick this color as the reference color at no cost. Otherwise, pick the two most frequent colors, the exact location of these two color can be obtained with at most P-1 queries. Either of them can be reference color.Phase 3:Enumerate all possible code for the first few positions until the number of possible code exceeds a predefined capacity. For the rest of the positions, fill them all with the reference color. Pick a random code from the enumeration as query.Then filter out those impossible code and generate more code by appending all possible colors to the unknown position as long as the total number of code is still within the capacity. Repeat querying until only one possible code is left and this is the final answer.Minor improvement:1. Instead of picking a random code in phase 3, pick a code with color frequency which matches the already known color frequency.2. Switch to a completely exhaustive-serach-approach at the beginning if pow(C,P)<2.5e8
 Re: Post your approach (response to post by lyc1977) | Reply ok here's mine let K= k, L=l (from init)step 1 - do k passes (i was being sloppy)with 0000...,11111..., ..., kkkkk... (k=K-1)to find out how many of each color there are.step 2- randomly generate all the rest of my guesses with permutationsof000112222555... (assuming there were 3 0's based on first k guesses, 2 1's, 0 3s, 0 4's 3 5's...);step 3for guessnum=k until 3000?let p[K,L] = 1/K be an array of double representing the probability that the lth color is kestimate current p matrixrepeat 100 times or until done{ let sum1(x) = sum of p[i,x] where i in [0,K)} divide each p[i,x] by sum1(x) let sum2(n) = sum of p[guess[guessnum][i],i] where i in [0,L] the expected value is results[guessnum][0] force the sum of p[guess[guessnum][i],i] to the expected value (mult by something) done if Abs(sum- expected)<1e-8}generate a vector based on highest p for each slotcheck to see if consistant with previous guesses if so then we are finished - return it...otherwise return the permutation generated in step 2there was a lot of room for optimization but I didn't get what i did working until last night.I was pleased to get scores approaching 2
 Re: Post your approach (response to post by MightyByte) | Reply My approach was simply to generate guesses compatible with results of previous guesses.The problem is that I didn't find guarantied way to find this compatible guess in time. So I was looking for guess with as small error as possible.The error I've used = sum(guesses, 10000*abs(guess.a - current.a)+abs(guess.b - current.b)), where b is number of perfectly matched pegs, and a is number of total matched colors (results[0]+results[1] from nextGuess arguments).Now, I generate each guess candidate by first generating compatible color counts (first part of error sum), then generating most compatible order.In both of those steps I first quickly make some reasonable initial state, then optimize it by some number of randomized changes, each taking into account possible reduction of error. To speed up things a lot I've maintained arrays of error changes for various modification of the state, and look up for the best moves in this array. I have found it's much more efficient to have this array maintained, than to recalculate errors after each move.I've managed to generate up to 50000000/(L*L*L) candidates per each guess, using reasonable runtime - up to 15 seconds in worst case.If any candidate has perfect match (zero error) I return the guess immediately.Another part of my solution is the code for getting as much as possible information from guesses analytically. The code estimates possible minimum and maximum values for color counts, possible assignment bits of colors in each place, etc. It allowed me to determine exact numbers of each color in reasonably few guesses - about K.Initially, I've tried to issue first K-1 unicolor guesses to determine counts, but it turns out it wasn't necessary. This approach gets no information about ordering from the first guesses, and I've got ~100 score from it (my second submission).
 Re: Post your approach (response to post by venco) | Reply The problem is that I didn't find guarantied way to find this compatible guess in time. So I was looking for guess with as small error as possible.You would have won a much bigger prize if you had: http://tw.arxiv.org/abs/cs.CC/0512049
 Re: Post your approach (response to post by lars2520) | Reply I was almost sure it's NP-complete. :)But I'm not theory expert, so I've formulated it in this way. :)
 Re: Post your approach (response to post by venco) | Reply In the worst case, My method takes (k/2 * L) + k - 1 guesses.that is, k - 1 initial guesses;for each position, issue a guess with two candidate colors,one as background color.for example, X Y X Xafter this, we can know whether this position is color X or color Yor neither.This brings me 85 scores. I didn't make full use of the 20 secs.in fact, for most cases it finishes in 1 sec...If i make use of previous guesses by introducing a formula like yours,the result may turn out better.
 Re: Post your approach (response to post by lars2520) | Reply I find it intriguing how often I've thought "this is probably NP-complete" and it turned out to be true while I did absolutely no analysis. Are there any easy to regocnize common properties of NP-complete problems I might be recognizing without knowing?
 Re: Post your approach (response to post by dskloet) | Reply My guess is any problem that looks like it would take some exhaustive search to arrive at an optimal answer?
 Re: Post your approach (response to post by dskloet) | Reply What I've managed to intuit is that, more often than not, NP complete problems tend to have solutions which can be completely different if the inputs are modified by a constant number of 'elements' (like a constant size graph attachment, or something). This is the case in Minimal Vertex Covers, the Knapsack Problem, Integer programming, K-coloring a graph, the traveling salesman problem, probably most of the others as well. I don't think that the converse is true.Also, it's not true that you need to iterate over the entire, or most of the state space for NP complete problems: there are some pretty interesting new results which drastically reduce the number of states you need to examine in integer programming, not to mention the classes branch and bound. It's pretty cool.
 Re: Post your approach (response to post by dskloet) | Reply A few seconds with google will usually tell you if a problem is NP-complete. With this problem, about ten seconds.
 Re: Post your approach (response to post by Rustyoldman) | Reply Ofcourse, but that wasn't the kind property I was looking for.
 Re: Post your approach (response to post by lars2520) | Reply It looks like from the paper you mentioned i took the "static" route.http://tw.arxiv.org/abs/cs.CC/0512049btw - when is the testing for MM2 going to run?
 Re: Post your approach (response to post by inrm) | Reply We're holding off on testing while we do some verification of a few things, testing will most likely start tomorrow morning.
 Re: Post your approach (response to post by TheFaxman) | Reply thanks!
 Re: Post your approach (response to post by msuchopper) | Reply I was new to Topcoder and such kind of problems, hence I probably did not make much use of the 20 seconds time alloted.I tried to find a regular efficient simple direct solution, and and this resulted in generation of guesses with hardly any computation in almost no time.Here's the thing I did:Stage 1) Use k-1 unicolor guesses to find the number of pegs of each color. I tried a lot to think of other ways instead of doing this way, but I couldnt find a way to make them simple enough to be coded!Stage 2) Finding Background color: If there is any color with no pegs, skip this step, and and use that as the background color. Otherwise, do a linear search on the rarest color to find where it's pegs are located, and then it can serve as the background color for the remaining spots.Stage 3) Now that we have a background color, we could replace any pegs with this color, and the reduction in the number of exact matches would correspond directly to the correct pegs we have displaced. So using this, I did a binary search for each color's pegs -- basically replacing the pegs I was interested in by the background color, and thus finding how many pegs of a color are located there, and then changing the region of interest in a binary search fashion. Repeating this for each color, I was able to find the actual code.Using this approach I scored around 65. Then I made an improvement in stage 2, by using a modified binary search to find the locations of the rarest color thereby reducing a possible whole pass of k guessses.This got me to score ~75. I think this is a decent method --- if the time taken to generate the guesses is also taken into account, this would compare fairly well I feel.However, I felt this approach was not doing good l is approximately same or lower than k.
 Re: Post your approach (response to post by venco) | Reply Initially, I've tried to issue first K-1 unicolor guesses to determine counts, but it turns out it wasn't necessary. This approach gets no information about ordering from the first guesses, and I've got ~100 score from it (my second submission).I did not participate in this match, but I thought of the first phase of finding the counts of colors:First try k-1 guesses with 2 colors in each guess (half of one color, half of other, randomy sorted). let`s call the numbers of pegs in each color A,B,C,D. Then (for 6 pegs) You have:guess: ababba, result:1 miss,1 hit, equation: C+D=6-2 guess: ccacaa, result:1 miss,0 hit, equation: B+D=6-1etc..and last equation: A+B+C+D=6Assuming that no peg color constitutes more than a half of the key, this will give the number of pegs and some information about placement in k guesses. But I guess the further part of telling which peg goes where was the difficult one :)
 Re: Post your approach (response to post by MightyByte) | Reply I had tried a genetic algorithm but it took far too long to converge. So instead I used a two phase approach:In the first phase (same as in lyc1977's solution) I used k-1 guesses to determine the number of pegs of each color.In the second phase I searched for permutations of the pegs that were as consistent as possible with the previous observed responses. I measured the "consistency" of a potential guess as the sum, over all previous guesses, of squared differences between the actual number of exact matches for the guess and the number of matches that my potential guess would generate.To do the search, I started with my prior guess. Then I did a brute-force iteration over all pairs of pegs, looking for swaps that improved the consistency. If there was enough time, I also did a brute-force iteration over all 3-cycles of pegs, again looking for permutations that improved the consistency.If a search ever failed to find any improvements over the prior guess (which actually didn't happen very often), I guessed a random permutation of the pegs.
 Re: Post your approach (response to post by MightyByte) | Reply My approach seems to be quite different from what other people have posted, and because I hacked a lot of it together on very little sleep, it probably would probably be quite difficult to reverse engineer. Here it is in plain english for any who are interested.Phase 1: Count the number of pegs of each color with at most k-1 solid color queriesPhase 2: Split the code in half and find the number of pegs of each color in each half. This took at most k-1 queries with half color i and half color i+1 which combined with the total count for each color yields a system of linear equations with a unique solution. Repeat this recursively. Exactly how many queries this takes varies considerably, but O(l*log(l)) is an upper bound.Phase 3: When the size of the blocks being split up in phase 2 gets small (around 10) then switch over to a brute force method of making the next query consistent with all previous queries. The count of each color in the block was known from phase 2, so I only had at most d! instead of kd possibilities to sift through (d being the length of the small blocks)
 Re: Post your approach (response to post by MightyByte) | Reply My approach was very similar to what has been already posted:* Phase 1 Calculate frequency of each color (K-1 guesses at most)* Phase 2 Generate random permutation (1 guess)* Phase 3Search for a code that minimizes:cost = sum over all guesses{ abs(expected hits - code hits) }(in other words, search for the code "more consistent" with all the previous guesses)The search was performed in the following way:a) start with the best guess so far (the one with the most hits)b) randomly mutate the guess by swapping two pegs (of different color)c) calculate the delta of the cost; improvements (ie. negative deltas) are always kept while degradations are randomly kept (I used a simulated annealing probability schedule)d) repeat from step b until the temperature reaches zeroBTW, I also thought about adding the brute-force algorithm for the smaller cases since it scored way higher (more than 3 on average on my tests) but I didn't wanted to take (again) the risk of submitting in the final hours (my entries have already "suffered" enough because of that). Anyway, I keep wondering how big an improvement I did despise... lyc1977: do you have an estimate on your improvement for brute-forcing the smaller cases?
 Re: Post your approach (response to post by gsais) | Reply I thought about brute-force solution for small cases, but I've found that my generic algorithm almost always finds compatible guess for lengths up to 20, and larger cases are not brute-forceable anyway.
 Re: Post your approach (response to post by MightyByte) | Reply My approach had 3 parts:1. count colors in c-1 guesses (ideally this would be nearly skipped)2. find a totally wrong guess3. keep a set of all possible combinations of a subset of each position color pair. generate guesses isolated to the subset (simulate to find best avg guess) and filter out the ones which are not consistant. if this set gets below a certain size add another position color pair subset.
 Re: Post your approach (response to post by MightyByte) | Reply My approach was very similar to gsais's. As a first step, I used up to (K-1) guesses to find the color counts, but at every step, I used the previous information and random swaps to extract more information (so if the first guess 00..00 shows that there are 2 0s in the code, my second guess was a random permutation of 2 0s and L-2 1s, and so on).In the second stage, I generated a few random permutations of the correct colors, and using single swaps, I tried to get a code, which is as consistent with the previous answers as possible. My consistency metric was the same as gsais's, Sum(Abs(Expected matched - Actual matches)).This algorithm works the best, if at each step a large number of random permutations are examined, so after TheFaxman told that inline assembly is allowed I rewrote my optimize-by-swaps code to use SSE2, giving it about six-sevenfold speed increase. So my final code examines 3.7e10 / (L^4 * log2(K)^2) combinations at each call to nextGuess(), that ensures that runtime will remain around 12-14 seconds even in worst case (according to local test, that means a score of 1.6).
 Re: Post your approach (response to post by vdave) | Reply Several coders have mentioned random swaps as the way to find compatible guess.My code does a bit more. It repeatedly removes 2-8 worst pegs, then adds them back in the best places. In each step if there are several possibilities with the same change in error, I pick one randomly.Similar algorithm was used in the first stage when I don't know color counts yet for finding compatible set of counts.
 Re: Post your approach (response to post by vdave) | Reply It looks like skipping stage 1 is what allowed me to take the lead with large margin.I've added this stage to my final submission, and the score dropped to around 113.
 Re: Post your approach (response to post by venco) | Reply Yes, I guess when K >= L, "wasting" K-1 guesses for color frequencies is not a good idea.
 Re: Post your approach (response to post by vdave) | Reply This is provably true! With K-1 guess it is almost always possibleto split the code in two equally sized parts and know the color distribution in both parts. That's how my solution basically works.It continues to split each part in two until each part has less thanabout 1Million permutions and bute forces these. Additionally thereare a few special cases for small lenghts. This gave me ~110 points soit can't be too bad.
 Re: Post your approach (response to post by cae) | Reply One more thing:http://www.mathworks.com/contest/mastermind.cgi/home.html
 Re: Post your approach (response to post by vdave) | Reply Let len=l (size of array)1) Perform (k-1) guesses to compute freq of colors.2) Set the whole array elements to the most frequent color.3) Perform (len) operations, each one is only changing an elem to the next most frequent color. This will determine exactly all elems that belongs to one of the two most frequent colors.4) During step 3, get the number of elems with most frequent color in the first half, and using this info, perform (k-2) guesses to determine the frequency of each color in each half.5) Iterate over all elements in the first half, changing two elems at one time to the most frequent color in their candidate list. By The most frequent color, I mean the color that has the largest frequency in the un-solved positions of the first half.6) Make the same for second half.
 Re: Post your approach (response to post by vdave) | Reply I did not get the method of guessing based on previous guesses. Can you please explain what is:Sum(Abs(Expected matched - Actual matches)), and how the new guess is selected?
 Re: Post your approach (response to post by aminallam) | Reply I did not get the method of guessing based on previous guesses. Can you please explain what is:Sum(Abs(Expected matched - Actual matches)), and how the new guess is selected?The method is very simple (in theory): always return a guess that is completely consistent with all the previous feedbacks. It goes like this:a) A code is somehow choosed and postulated to be the secret keyb) The postulated code is used as the key to score each of the previous guesses and the resulting scores are compared against the codemaker's feedbacks for those previous guessesc) If any of the comparisions fails (ie. we calculated a score of [3,5] but the codemaker's feedback was [4,5] for that guess) then it's said that the postulated code is "inconsistent" with the feedbacks (thus it can't possibly be the secret key) - another code must be choosed (repeat from step a)d) Otherwise, if all the comparisions are successful, then it's said that the postulated code is "consistent" with the feedbacks (thus it can possibly be the secret key) - the postulated code is returned as the next guessThere are two important observations to make about this method:1) All of the returned guesses have a greater than zero probability of being the secret key2) The total number of consistent codes always decreases with each guess made - therefore the probability of finding the secret key always increases with each successive guessIt's because of these two features that the method is tipically called "stepwise optimal strategy" and it has been already found to be near-optimal on average (of course, optimal in this sense means relative to other algorithms - not to the ultimate optimal of always finding the answer in the first guess using shamanic powers). Strategies that are not stepwise optimal include "human" approaches (ie. slowly changing the guess to arrive at the answer by reasoning), and all the approaches that use a "background" color (ie. a color which is known to not occur in the answer or else that it's known where it's located within the answer).As good and simple as the stepwise strategy is, it has a very important complication: altough finding a consistent code for small problems is rather easy, for larger problems is extremely hard. In fact, it has been already proved that just deciding if there exists a consistent code (a subset of the problem of finding it) is a NP-complete problem; see lbackstrom's post above. Even for not so large problems (~ K=10 L=15) you can't actually hope to find a consistent code by simply generating random codes, you really need to use some heuristic to guide the search. However, even with good heuristics, for slightly larger problems it's almost impossible to find a consistent code (unless you already have enough hints), and for the very large problems the only thing that becomes easy is understanding the true meaning of "finding the needle in a haystack". This is where the "almost" stepwise strategies come, and they come in basically two flavors:* Divide-and-conquer: use a background color or an analytical method to divide the problem in smaller problems and apply a stepwise approach for the smallest problems; see the approaches of lyc1977 (background color; discarding the impossible combinations is a way to find the consistent codes) and OvejaCarnivora.* Least inconsistent code: instead of finding a completely consistent code with all the previous feedbacks, find the code that is the least inconsistent (within a subset of the search space). Essentially, a cost formula is used that measures how much inconsistent a code is, and any optimizing algorithm is used to minimize the cost. venco used the following cost formula (note: hits = feedback[0] = pegs with correct color and correct position; hints = feedback[1] = pegs with correct color but incorrect position):` cost = sum { abs(hits - hits') + 10000 * abs((hits + hints) - (hits' + hints')) }`The 10000 factor is used to give more weight in the first moves to the total number of correct colors (hits + hints) than to the number of correct positioned colors (later, once the color frequencies are discovered the term becomes automatically zero). Now, since both vdave and I started the search already knowing how many pegs of each color were in the answer, the number of hints was completely irrelevant (because hits + hints = L if the code is a permutation of the right colors), thus the cost formula reduces to:` cost = sum { abs(hits - hits') }`Furthermore, knowing the frequency of each color greatly limited the search space and also simplified the operation of finding the closest neighbours of a code (ie. codes with an approximate cost) to swaps. But these advantages didn't come without a price: as venco himself has already stated, his 14+ points advantage over vdave (in the tests) are very likely due to his algorithm skipping the color's frequencies discovery phase - yet he doesn't say that achieving that was far away from being an easy task at all (I can bet I wasn't the only one that unsuccessfully attempted that, so I'm looking forward to read exactly how he did it in the match editorial - right now I'm clueless edit: ok, now I understand how he accomplished it: it's thanks to the brilliant 10000 * diff(hits + hints) term in the cost function). On the other hand, vdave's "smart" discovery phase is rather intuitive (after one is told about it, of course...), easy to understand and implement, and it's also likely what allowed him to take the second spot in the tests.It's interesting to notice that the first and second places used a stepwise strategy consistently during the whole game (ie. a wrong move was never knowingly played); what it's more interesting to notice is that the third place (flagitious) and probably the fourth place too (cae) used the almost exact opposite strategy: playing knowingly wrong positions throughout most of the game! I believe that each of these two strategies have strengths and weaknesses in different areas so none can be considered better "overall" than the other. In particular, I believe that flagitious's approach has a bigger potential to be much more scalable than any stepwise approach, but I don't want to speculate too much before the final results are published and we can see the results...Edit - Fixed venco's cost formula
 Re: Post your approach (response to post by gsais) | Reply I've got to say, this was a great summary post of what the leaders did (I haven't had a ton of time to skim through this thread). Nicely done.
 Re: Post your approach (response to post by gsais) | Reply venco used the following cost formula (note: hits = pegs with correct color and correct position; hints = pegs with correct color but incorrect position): cost = sum { 10000 * abs(hits - hits') + abs(hints - hints') }The 10000 factor is used to give more weight to hits than to hints. Actually it's the other way around. I give more weight to hints to find correct color counts as quick as possible: cost = sum { 10000 * abs(hints - hints') + abs(hits - hits') }After all counts are found (it usually happens slightly later than after K guesses) I start to use correct number of different colors, so the first part of cost always yield zero, and algorithm concentrates on ordering.
 Re: Post your approach (response to post by venco) | Reply venco: Thanks for clarifying that! Now I can start to understand how you managed to skip the single-color guesses discovery phase.
 Re: Post your approach (response to post by gsais) | Reply hits = feedback[0] = pegs with correct color and correct position; hints = feedback[1] = pegs with correct color but incorrect positionHere is one more difference. I've used total number of matching colors as value hints in cost formula, no matter if it's they are on correct place or not.It can be calculated as results[0]+results[1] from nextGuess argument.This way hints is plain measure of how good is your guess in terms of choosing various colors of pegs, and hits is the measure of order quality.
 Re: Post your approach (response to post by venco) | Reply Yes, I've understood that part and already edited my post so that the cost formula would read "hits + hints". I've previously misread your approach description and that's why I thought that the 10000 factor was giving more weight to hits (perhaps because I tried that in my entry -giving more weight to hits- before I gave up and switched to discovering the color frequencies the "easy" way...)By the way, I think your idea of giving more weight to hits+hints is brilliant (sure, I wish I'd had come up with it myself :-)
 Re: Post your approach (response to post by gsais) | Reply Thank you very much Mr gsais for this perfect explanation.
 Re: Post your approach (response to post by gsais) | Reply Actually, my taking the second spot is rather due to some inline assembly (SSE2) code and not only the "smart" discovery phase. SSE2 allowed the checking of about 6-7 times more random candidates than a C-only version. My best standard C code scored around 112-114.
 Re: Post your approach (response to post by MightyByte) | Reply Phase 1:Determine the colors present in the code. If all the possible colors were in the code, start with Phase 2a. Otherwise, jump to Phase 2b.Phase 2a:This phase is a variant of a simple brute force search where you change each peg one at a time until results[0] increases by one, then you move to the next peg. Instead of doing things 1 peg at a time, I did them 2 at a time. A little trickier than 1 peg at a time since there were some special cases that had to be handled, but still manageable. After you've found the color for a peg, update the number of pegs left in the code of that color. Once you find a color that doesn't occur in the rest of the code, move to Phase 2b. Phase 2b:Similar to what other people did with guessing codes that are consistent with what has been guessed so far. Rather than guessing the whole code, I guess blocks at a time, starting at the leftmost part of the code that hasn't been determined yet. Once a "block" has been determined, move on to the next block. When randomly guessing a code, I would randomly pick results[0] pegs to keep from the last "best" guess, and then guess the remaining pegs using the probability distribution of the colors of the remaining pegs. (The last "best" guess is the one which I guessed that had the largest value of results[0].)
 Forums Marathon Matches Marathon Match 2 Post your approach Previous Thread  |  Next Thread