



My solution, worth 187291.42 provisional points (14th place so far), is very basic. It is a combination of Simulated Annealing over Simulation. I run 1001000 simulations of what can be hidden in the bottles. Then I want to answer fast questions of the sort: "Given bottles in range X1X2, is there any poison in them?" To answer this, there is a common data structure, which I call array prefix sum. Then, it was more or less clear that on each round it is better to use all the available test strips, though I did not ever prove this. It was also straightforward to split bottles into groups, and put them into a priority queue. Otherwise, there would be a little use of the prefix sum structure. So, initially we have a simgle group of all the bottles, then we split them into smaller subgroups an put subgroups into the queue, and so on. Each time we want to take some test, we pop a group from the queue. Then, for each round, I run SA, which uses my precalculated simulations for estimating score. The state for SA is "how much expected poison there should be in each tested group of bottles?", and this is calculated for each remaining round. Whenever we need to take tests, both in SA estimation, and in real testing, we greedily take items from the queue until their sum expected poison will not reach the threshold. Put the remainder subgroup back into the queue. Once we know the "best" possible amounts of poison in a testing group, we do a test. Then, rerun the simulations, based on acquired new knowledge: make nonpoisoned intervals empty, and put at least 1 poisoned bottle into poisoned intervals. Split the remaining poison randomly between poisoned and untested intervals.
Improvements: Priority queue items have cost function = (expected poison amount in a group / number of bottles in the group). Use aggressive mode, once all the poison is detected in some groups. In essence, we may mark some bottle groups (ranges) clean without testing them. For this to be true, we must be sure that a) all poison is detected, b) we were testing a part of poisoned region and c) we found a poison in the subregion by test. Then, the remainder is clean. Symmetrically, if we know that a) all poison is detected, b) we were testing a part of poisoned region and c) we did not find a poison, then the remainder is poisoned. Performance: stopped using STL containers. Some gain in points.
Doubts: It is clear that we do not need to predict the whole strategy till the last round, if say we are currently on the first round. We just need to know how to deal now. I tried to fix the current round's poison amount for each group, and run independent SAs for each of my simulations only on the next rounds, but this worsened score (locally at least). Any ideas why?
P.S. I was on a vacation most of the match, thus submitted my first attempt after wleite submitted his last one. 

Similar approach, while I tried to calculate the probability analytically during the first loop (each bottle only tested up to once). This covers a lot of cases (essentially most cases when testStrips<numPoison). 1. Given the first n tests, the probability distribution of the number of remaining poisoned bottles in the bottles not tested in the first n tests. 2. Given the number of remaining poisoned bottles after n tests, the probability distribution of the number of poisoned bottles in the test n+1. 3. Given the number of remaining poisoned bottles up to test n+1, the probability of getting the tests after that. The product of these gives you the distribution of the number of poisoned bottles in the test.
In terms of thresholds, I wanted to maximize the information I get from the tests. Each test strip can be used up to testRounds times, so the potential output is the number of total negative results it gets. The maximum information is log(testRounds+1), and it happens when the cases have the same probability. So when there are i rounds remaining, it is better to keep the probability of positive results around 1/(i+1). That's the threshold I used.
From the second loop, I saved expected number of poisoned bottles for each test and used that information to prioritize bottles to test. Positive tests with larger bottles have the priority. Once some subtest is positive, the remaining part is likely not containing the poisoned bottles and tested in the next round if possible.
I could have fine tuned the parameters, but it was very easy to overfit and I just used the values described above.> 

The maximum information is log(testRounds+1), and it happens when the cases have the same probability. So when there are i rounds remaining, it is better to keep the probability of positive results around 1/(i+1). That's the threshold I used.
How do you estimate this "same probability" in advance, if you cannot know how many test strips there will be left after some rounds are run? Also, why do you use 1/(i+1), that is, { 1/n, 1/(n1), 1/(n2), ..., 1 } instead of this theoretical same probability for each round? 

We don't know for sure, but if the first test have positive probability of 1/(testRounds+1), the probability of that test strip to be remained for the second test is testRounds/(testRounds+1). Then the probability of positive result for the second test is 1/testRounds, so the probability of passing test 1 and failing test 2 is testRounds/(testRounds+1) * 1/testRounds = 1/(testRounds + 1). For each test strip, failing at test [1, 2, ..., testRounds, never] have the same probability of 1/(testRounds+1).
Of course, this doesn't take correlation between the results of test strips into account, and the goal is not maximizing information but identifying definitely safe bottles, but they seem to work to some extent. I am running the local system test suggested in another thread, and the results seem to be just a bit below yours. 

+1 to testRounds/(testRounds+1) being the probability which maximizes the information gain (when testRounds are remaining).
I used this for the most part of the contest (i.e. choosing items for each strip such that the probability of having zero poisoned bottles in the set is approx. testRounds/(testRounds+1)). I did this by computing (approximately) the number of combinations that adding each new item to a strip was "splitting" (i.e. how many remain on the positive, respectively negative side). I think this got me up to ~182K on the provisional data set. The main issue with it is that what's left at the end doesn't necessarily minimize the number of "uncertain" bottles, e.g. for K=2, it can tell you that it's one of the combinations (1,2), or (3,4)  this is just 1 bit left of additional information needed to differentiate between them. But, even so, I would still need to consider all the 4 bottles as potentially poisoned, which is quite bad scorewise.
So during the final weekend I threw away my solution so far and wrote a new one  essentially a DPbased solution which tries to maximize the expected number of saved bottles (not their square, although that would be closer to the real score, because I either didn't figure out how to compute it properly, or because what I did implement was both worse and slower). I'll try to describe it in a followup post (when I have more time to write it). 

I had special case for poisoned=1 You can find that bottle just in one turn using ceil(log2(n)) strips. Each strip corresponds to 1 bit, so the first strip contains all bottles having the first bit set (1, 3, 5, ...), second strip contains all bottles having the second bit set (2, 3, 6, 7, ...), etc. Then the poisoned bottle corresponds to the binary number formed by positive/negative test of each strip. Example for 51 bottles ("result: tested bottles"): 0: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 0: 2 3 6 7 10 11 14 15 18 19 22 23 26 27 30 31 34 35 38 39 42 43 46 47 50 1: 4 5 6 7 12 13 14 15 20 21 22 23 28 29 30 31 36 37 38 39 44 45 46 47 0: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31 40 41 42 43 44 45 46 47 1: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 48 49 50 0: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
The answer is bottle 20 (010100 binary).
If I have fewer strips than needed I group the bottles evenly and then the result of a test is group number that contains poisoned bottle. Then I repeat while test strips / rounds remain. 

huge testsets were needed to see consistent results, but timelimit is too high, so i decided to stay fast and deterministic for most of my work. this way i could use 500k+ testcases.
as others mentioned, most important observation is to to "burn" equal amount of strips for each round (worth a very good early score) the threat of burning a strip depends on the bottle/poison ratio, but not in a trivial way. with lots of testing on huge sets i managed to set up static multipliers with couple of setups (eg. singlerun or not) with a couple of finetunes on edge cases.
reshuffle: when to continue the remainders and when to recheck the already "bad" marked blocks. i have battled this problem all along, but was unable to handle it properly (even with dynamic optimization). generally it just reshuffled with very few bottles/poison remain, and usually randomized the whole set again (i know its loosing information, but i was unable to utilize that anyway), except when there is only one round left with just a few strips (in that case blocks were "reversed"  which generally mean starting with the "bigger" ones). there is a special case though which was well handled: when each block has a single poison. in that case one part of each block can be left empty (as the tested parts of the probe will indicate its status, distributing strips evenly (based on block sizes)
last week i tried to add dynamic optimization with limited success. without serious tests/time, it was hard to tell what helped or not (most of the time i have used 110k, with strict timelimit) dynamic optimization was based on simulations and round by round local decisions. setting up few thousand test cases and simulate the whole process ahead (with the existing static optimization). ternary search was used to locate the strip size with the best final score average (altough i'm pretty sure many times it may pick some local max). last days i tried to dynamically optimize the reshuffle cases and finetune number of testcases vs. time/out, but was failed with both.
the problem had some interesting depth, but the actual setup did hurt my motivation. tedious testing, too many "uninteresting" testcases, not knowing how good my solution actually works compared to others (50 testcase is way too low, the final order of 230+ place can be totally different). and the uncertainty of the system test (5k test is still not enough). which is especially problematic for a round which will decide who gets into the TCO finals. 

My solution is doing passes over the whole set of bottles, where in each pass disjoint sets are queried (modulo corner cases when the number of strips is much larger than the number of poisoned bottles). To estimate the value of a state, I use a DP with the following dimensions: * number of bottles in poisoned sets, * number of poisoned sets in this pass (a pass can last multiple rounds), * number of bottles not touched in this pass, * number of strips left, * number of rounds left. As the state space is very large, I discretize the first and third dimensions by using exponentially increasing values (with base between 1.03 and 1.12) and using linear interpolation in between.
To make testing more stable, for each test I did 500 runs where in each run I remap the bottles names. Since preprocessing (DP computation) is the same for each of the 500 runs, I could solve 500 instantiations of same test in almost the same time as a single one. I used 3k tests (each run 500 times) for local testing. 

Provisional score: 193495 Provisional rank: 5
I also used DP. My state has the following parameters:  number of rounds left  number of strips left  number of poisoned bottles (this can be lower than the initially given value)  number of bottles considered
There are at most 10 x 20 x 20 x 1000 states (this is because the number of poisoned bottles parameter is always at most the number of strips lower than its initial value).
For each such state I compute the maximum expected number of saved bottles, if I chose the best move among the following:
1) I add s bottles to each strip (s * num_strips <= number of bottles left) and the remaining ones are left out (each strip tests a disjoint subset of s bottles).
I compute the probability of remaining with 0, 1, 2, ..., strips after such a move (the best way I could find was to use inclusionexclusion for this). For each outcome I have two options:
1.1) I identified some nonpoisoned bottles and I continue with all the bottles from strips which tested positive + the bottles which remained outside tests; the number of poisoned bottles stays the same, but the number of considered bottles may decrease (those from at least one strip which tested negative); also I have one less round available
1.2) I throw away all the bottles from strips which tested positive (i.e. I continue only with the bottles which were left outside tests). This allows me to continue with a smaller number of poisoned bottles
About 1.1 + 1.2: I forgot to try to "throw away" only part of the bottles which tested positive, e.g. if I get 5 positive strips, I could throw away 0, 1, 2, ..., 5 groups of bottles; my solution now only considered throwing away either 0, or all 5; if you know you have 11 poisoned bottles, then by throwing away 0 bottles, you're still left with 11 to find; by throwing away 5 groups of bottles you still need to find 115=6 poisoned bottles, at the expense of losing the thrown away bottles; but maybe the right combination was, e.g., to throw away 2 groups of bottles and continue with the remaining ones, with 112=9 poisoned bottles left to discover.
I had a TODO to try this but I forgot to try it :) it's trivial to implement it in my DP setting  maybe I'l try it to see if it makes any difference.
The interesting part is that the optimal s increases as N increases (for some fixed values of the other parameters), so computing this could be done with O(N) amortized evaluations (for some fixed values for the number of rounds left, number of strips left, and number of poisoned bottles).
2) I go "all in", i.e. all the bottles are used in at least one strip. I only considered the case when each bottle is used in either q or q+1 strips (with a fixed probability of randomly choosing either one of "num_strips choose q" subsets of strips, or "num_strips choose (q+1)" subsets of strips for each bottle).
For this I tried various "average numbers of strips per bottle". The interesting part is that given my restriction (x% randomly choosing a qsubset, and (100x)% choosing a (q+1)subset), the probability of obtaining a given number of positive strips is a polynomial in x (with the number of strips, number of bottles and number of poisoned bottles being fixed); so for each q (up to q = 8, I think) I computed and stored these polynomials. Then I could test multiple values of x at the cost of evaluating a degreeK polynomial (where K=number of poisoned bottles).
By the way, I only used this "all in" approach when "number of poisoned bottles < number of strips".
I essentially precompute the whole DP table at the beginning. The slowest cases are those with large values for the number of bottles, strips and rounds. A large number of poisoned bottles is not that relevant because I avoid the "all in" case for them (preprocessing takes 10 seconds for 10000 bottles, 10 rounds, 20 tests, and 200 poisoned bottles). Actually, it turns out the slowest case is when the number of poisoned bottles is ~910 (and all the other parameters are maximum)  it takes me 40 seconds for everything in this case. But making decisions after that is really fast.
During the actual "game" I keep the bottles split into disjoint groups  for each group I also keep an upper bound for the number of poisoned bottles in it. Then, at each round, I consider multiple possibilities:  choose the group with the smallest number of poisoned bottles and assume everything else is thrown  choose the 2 groups with the smallest numbers of poisoned bottles (the upper bound is summed, restricted to not exceeding the known number of poisoned bottles) and assume everything else is thrown  etc.
I choose the combination which maximizes the expected number of saved bottles (this is fast because all the DP states are already precomputed at this stage).
After every round of testing I also perform some computations in order to identify groups of bottles which are certainly nonpoisoned or certainly poisoned. I do this by computing minimum set covers for the strips which tested positive so far (you can infer sufficient information sometimes to fully exclude some bottles, or mark some bottles as poisoned with 100% certainty). Excluding bottles is obviously good. Knowing that some bottles are poisoned with 100% certainty also helps, because I can exclude them for testing in future rounds (so I remain with essentially fewer bottles, but also a smaller number of poisoned bottles to discover  especially reducing the number of poisoned bottles to discover is helpful, because it makes my testing behavior "bolder" somehow, since the risk of losing strips is now lower).
Something else I've thought of trying was to speculate at the end, i.e. potentially "save" more bottles at the end of the testing rounds. I tried to formulate this as an expected score maximization problem :) If I don't speculatively save any bottles, I have a known guaranteed score. If I speculatively save x more bottles, I know what the probability of getting a zero score becomes, but the actual score if I don't get zero is larger, so maybe the risk is worth it :) In the end I didn't do it, because I did end up getting zeroes on some tests and I didn't feel comfortable with this.
However, I have some tests where I genuinely get a score of zero  these are tests with large number of bottles, large number of poisoned bottles, 1 round, and a small number of strips (56). In these cases all the strips I use test positive, so I don't get any information from them. I discovered such tests too late, but speculatively saving bottles would have been good for them (anything is better than a guaranteed zero score). Still, the additional score I could get from doing this didn't seem important enough to do it and potentially introduce new bugs (it was during the last hours of the contest, when I could make only 1 more submit). 

So this has been alluded to elsewhere, but if you can make (rounds+1)^(strips) >= [set of possible outcomes] then its at least theoretically possible to identify possible outcomes uniquely. So I have two bits of my solution:
(A) Can we group the bottles into groups of size b so that (rounds+1)^(strips) >= ((bottles/b) choose poisons) [in fact, if we've grouped into groups it needs to be sum from i = 1 to poisons of (bottles/b) choose i]
If so calculate the expected score [I initially assumed perfection, but then ran a load of tests and stuck in a rough estimate. It turns out that this code tends to on average narrow the search down to poisons + 1 groups , in the case where poisons > 1]
(B) Do a DP, similarly to one described above with state space being:
Rounds left Strips left Bottles which we know are ok [compressed to a grid] Bottles which we have tested and might be poisoned [compressed to a grid]
My forward move is either to use all strips to sample an equal number k of bottles (which I found with a ternary search) or to convert the remaining problem to a problem of type (A) above if possible.
I didn't consider resampling bottles which are possibly poisoned in part B but I think empirically this is almost never the optimal thing to do, as the code much more frequently will flip into a type (A) solution.
Having considered the (A) 'solve cleanly' and (B) DP values, I then choose the best one. Running the dp forwards is easy.
How do I solve (A)? I run a search which does a few things:
1  Given existing results of tests, exclude bottles which are definitely ok. 2  For all other bottles, calculate the 'mask' of the set of bad tests which they were and weren't in. 3  Then calculate all possible mask combinations of size <= poison ; these then allow me to plan forwards well. What I mean by this is that if you've failed 3 tests so far and two poisons, then some bottles will have failed test 1 (so get mask 100) and others will have failed tests 2 and 3 (so get mask 011) so a combination of 100 and 011 would work. But so would a combo of 110 and 011. All bottles uniquely map to masks and so this allows a fairly compact representation of the set of possible solutions which is also useful for performing calculations on, as you can sum over all groups of acceptable masks, the product of the number of bottles in each group of masks. So for example, if you are looking at all representations involving a 011 bottle and a 110 bottle, and there are 5 bottles with masks 011 and 7 with 110 masks, then you have 35 possible answers using that mask combination.
4  Find a set of bottles to query which approximately splits the set of possible answers to the question of which bottles are poisoned into sets of size ratio 1 : roundsLeft ; the mask combinations above make the calculation of such a probability for any group of bottles very fast. 5  Try putting these sets together to get a set of queries for s strips. Now calculate for all possible outcomes (in terms of strips coming back with yes and no ) the number of possible answers to the bottles question. Ideally for each of these you'd like 'number of possible answers to the bottles question' to be approximately the same as 'rounds left to the power of strips left' so that the forward search doesn't die. So I evaluate the possible set of queries using a heuristic which is sum over all strip outcomes of P(strips outcome occurring)*(number of bottleanswers remaining  strips outcome occurred) / (strips left usable ^ rounds left) 6  Calculating 5 is quite hard. If you have a subset of queries however, you can easily calculate how many possible bottle answers are consistent with that subset of queries all coming back saying 'no poison' , using your mask combinations from step 3. Inclusion exclusion then lets you turn this into what you want for 5. 7  Now we have a set of strips. Use it, and repeat. 8  If its the last step, then instead of evaluating strip outcomes based on remaining possible answers, I instead calculate the size of the set of bottles which could be possibly poisoned,so my utility looks like: P(strips outcome occurring)*(Number of bottles which could be poisoned in that case)
A few asides:
1  P(strips outcome occurring) is not proportional with the number of bottleanswers remaining consistent with that outcome because once you break the bottles into chunks, an answer involving poison1 bottle groups is not equally probable to an answer with poison bottle groups. So I adjust for this. 2  In the DP I initially was lazy, and said that P(sampling k bottles and being safe)=(1poison/bottles)^k . I then flipped to using the correct formula in terms of (bottlesk choose poison ) / (bottles choose poison.) This gave me ZERO improvement which I found very surprising. 3  I found my code was worst at finding solutions for cases with one round where the bound I described in part A would suggest the problem was soluble. I could have tried to find optimal solutions for these cases and hard coded them but didn't due to lack of time. 4  Obviously poisons = 1 is a special case, here you break into (rounds+1)^(strips) groups and then in each round k you get strip i to look at bottles with digit k in ith position in base rounds+1 5  As I mentioned in the other thread, my debug output references an out of bounds segment of a static array sometimes. It never broke my code locally or in 15k tests on an EC2 box, but its worth 0.6 points per hundred tests if it does break. 6  I read a bit about group testing and basically found it totally unhelpful for solving the problem but relatively fun. I think its amazing how it becomes so much harder to identify two poisoned bottles than one. (As in one you get to the informational theoretic bound, and two you are nowhere near.) 

However, I have some tests where I genuinely get a score of zero  these are tests with large number of bottles, large number of poisoned bottles, 1 round, and a small number of strips (56). In these cases all the strips I use test positive, so I don't get any information from them. I discovered such tests too late, but speculatively saving bottles would have been good for them (anything is better than a guaranteed zero score). Still, the additional score I could get from doing this didn't seem important enough to do it and potentially introduce new bugs (it was during the last hours of the contest, when I could make only 1 more submit).
Actually, if you all tests in the first round are positive, you still do get some information from it ;) (at least, if there were bottles not being included in your tests). Bottles not included in any test are less likely to be poisoned.
You can still make a "guess" and not return the full set of bottles. A good such guess would be one maximizing the expected score. 

I will make a longer post soon.
Main points: 1) For B > S (dense cases), use DP algorithm with dp[ n ][ s ][ r ] = number of expected free bottles if there are n bottles remaining, s strips and r rounds. Use ternary search when computing the transitions to get a lighting fast algorithm. The algorithm is similar to what others described.
2) For B <= S (sparse cases), we want to isolate each poisoned to exactly one interval, and then test each interval independently in parallel. We want to use r rounds for a 'sieve' phase which gets us exactly B poisoned intervals, and the remaining rounds we want to use to isolate the poisoned bottle. We decide how much to allocate to each phase by running 1000 simulations for each possible r. Also, for some cases with B close to S, the DP algorithm is actually better. We decide between the two with simulations. Finally, for the sieving phase, we reuse the DP algorithm to tell us how many intervals to test at each step. Finally, suppose we use some rounds and we lost some strips and we got B  1 poisoned intervals instead of B. Sometimes, we might just want to join all the poisoned intervals together and repeat the sieving phase. Sometimes, we want to go on with B  1 intervals, since this means that each interval can have at most 2 bottles. We decide for the right version with simulations.
3) Main failure: Literally 10 minutes after the contest ended, I realized the (1 + r) ^ s trick :) Instead, in the second phase I repeatedly use the (2 ^ s) trick. However, the fact that I simulate a bunch of different options for picking r helps alleviate this.
Overall, my algorithm was pretty fast, with most seeds taking less than a second. 

I tried similar dynamic programming. Since function is convex, instead of ternary search you can just take the best for previous state and keep incrementing it until score increases for very fast algorithm.
The problem there was that after you have set of suspected bottles some of them have higher probabilities of being poisoned than other and accuracy is bad. To tackle this I had an algorithm to generate thousands of random samples based on information so far and it gave more accurate results.
For the tests I was choosing bottles with the least probability.
I will have very bad score though, because I didn't handle sparse cases at all and didn't use any tricks with 1 bottle. For some reason I assumed that you can't use same bottle in same rounds twice. 

nika, somehow I also incorrectly assumed query sets for a given round should be disjoint. I don't have the numbers here, but I tested with the ranges proposed by marek.cygan and got *much* lower scores... 

Then you got a really huge score on the provisional tests considering you used only disjoint queries (I thought using smart combinations of queries, where each bottle is "encoded" in multiple queries, was key to obtaining high scores). What was your approach? 
