JOIN
 Select a Forum     Round Tables New Member Discussions News Discussions Algorithm Matches Marathon Matches NASA Tournament Lab TopCoder Cookbook High School Matches Sponsor Discussions Development Forums Design Forums Search | Watch Thread  |  My Post History  |  My Watches  |  User Settings View: Flat (newest first)  | Threaded  | Tree Previous Thread  |  Next Thread Forums Marathon Matches TCO Announcement Fun Round Post your solution
 Post your solution | Reply I wonder what you Topcoders used in this match?I use quite simple model, the solution is 173 lines short.1. Calculate expected mean and standard deviation for each given day's data. Each day has of course its own model (mean and std dev). First I used just averaging values of fish amounts over each position to calculate the expected mean. Standard deviation was just (sum of squared differences to mean for each trap position) divided by (total fish - 1). Here "-1" was the standard normalizing term for small samples.2. For each river width position, calculate the "expected" amount of fish if there were no rocks at all. Use cumulative distribution function for Gaussian distribution. In C there is erfc function in , the error function needed for CDF.3. Subtract the actual amount of fish at a width position from the expected. Call it delta. The higher the delta, the more fish we can get from this location. Or am I wrong?4. Sort all river width positions by the delta. Take the best (1 + width * width / log(width) / 500) of them.Later improvement:Use Truncated Normal distribution instead of just Normal distribution. Not sure I succeeded at this. My mean values are still wrong, sometimes significantly. This is very true for small fish amounts per day.Provisional score: 484533.01Provisional rank: 7P.S. How many local tests did you use? I started with my usual 100 tests, but as long as solution was pretty fast and results highly volatile, switched to 1K at the later stage.
 Re: Post your solution (response to post by gorbunov) | Reply I used 200k tests locally ;), because my solutions were trivial and I recreated the tester in C++, using inputs generated by supplied visualiser saved on disk. Testing 8 solutions in parallel took around 20 minutes on 2013 MacBook Air.My first submission three days before end (when I noticed there is another marathon ;) ), which placed me in the 5th place was: - sum all observations for each place and put single cage in the place with the most fish (take the leftmost one if there are multiple possibilities)I really did not expect much from it and it took me by surprise.Then I did simulations for widths 20 and 25, where I repeatedly generated 3 days, but without rocks. I tried all the 2^20 resp 2^25 possibilities of placing cages. It turned out the best in such case (no rocks) is to put cages in the middle (and it did not make much difference how many cages one used nor if they were next to each other).So when rocks are introduced I assumed they do not make much difference. Only rocks that completely block the place concerned me - so I never put a cage in place when there was no observation of at least one fish there.I tried a lot of variations (on 200k tests, where score was computed against optimal solution - single cage in the place where the most fish landed after three days):- do the same as my first submission, but use the one closest to middle if there are multiple possibilities - score raised- try to put a cage (two/three/four/five) closest to the middle if there was at least one observed fish - worse results than my first submission- try to put a cage (two/three/four/five) closest to the middle if there was at least X observed fish, where X is some percentage of the most fish per place (summed for all observations), preferring place with higher number of fish if the distance to middle is equal.In the end the best was to use single cage closest to the middle with X being 83 percent of max - the interesting thing is that all bigger/lower integer percentages were monotonically worse.I also tried to determine which of the above solutions gives the best answer based on observed parameters. But could not find anything useful. I tried neural networks and decision trees as classificators, but did not have much time to tune them.My final solution uses one IF I found by randomly trying splits of 200k tests on random parameter and which applies to 7.5% of 200k tests and increased my local score by 0.5% ;)My final submission (in the edit)