I am so sorry I have not posted my approach for a long time although I won the match.

I used Random Forest. However, training data was not enough in this problem, so I generated the data on my own.

1) I generated 200 pairs of location of rocks and fish as the tester did on run-time.

2) The location of rocks was used for generating expectation of fallen fish. We knew fish's distribution at the top. We also knew how fish moved after hitting a rock, 50% left and 50% right. By using them, I applied DP to calculate the expectation at the bottom. The expectation was used as ground truth.

3) The location of fallen fish was used for generating the following features.

- X, the location

- W, the height of river

- L, the length of river

- The number of all fish

Furthermore, for each neighbor Y (X-3 <= Y <= X+3),

- The total number of fallen fish at Y

- The total ratio of fallen fish at Y

- The difference between the total number of fallen fish at Y and at X

4) Trained the random forest for 50 seconds by 2) and 3). The number of trees was between 20 and 150.

5) Predicted the number of fishes for all locations. I set up one fish trap at the highest expectation location.]]>

1 NobuMiu 347824.92 1 479071.99

2 kurenai3110 347341.35 3 467872.11

3 ibra 347019.80 2 478950.48

4 TrePe 339223.70 4 459089.07

5 kishore_g84 331056.42 7 448069.72

6 soneoed 329519.06 9 443460.01

7 Stonefeang 329265.88 10 443031.19

8 simanman 328278.30 5 451516.68

9 u_seem_surprsd 328074.52 8 443702.92

10 mcw1142 312756.05 12 437812.36

11 gorbunov 309757.69 11 441617.95

12 wleite 308805.29 16 419308.44

13 cing3000 307390.07 6 448939.16

14 tashikani 297112.52 20 413476.90

15 daveor 296941.79 17 419299.81

16 myam 294652.93 19 414736.12

17 ecv 290570.03 14 432217.20

18 xyz600 290195.83 18 416814.71

19 yowa 287123.52 13 435279.01

20 tygrysek 284495.22 15 424137.68

21 EvbCFfp1XB 270701.27 21 408390.45

22 neetsdkasu 261961.44 22 399779.70

23 bakanaouji 246449.82 23 371031.89

24 Cypi 243245.85 26 334015.93

25 kossi_tg 239092.66 25 347050.90

26 wsobolewski 225754.14 24 348325.00

27 2PaVeL 192579.69 27 306967.48

28 id 128139.45 28 206641.68

]]>

Your overall score has nothing to do with that. The overall score depends on the raw scores that have been found by other contestants very much. This is pretty obvious statement, isn't? But what I'm talking about is that if you want to submit the "best" in terms of trap effectiveness solution that you have in mind, you are thinking about creating 200 additional accounts that are going to produce together the best answer on each test case. Otherwise your "best" solution won't work. Otherwise you have to rely on what other contestants managed to achieve.

So you keep in mind that everyone is maximizing his own probability to return the optimal solution, which leads to the situation where some tests are not covered with high raw score. It makes sense to step aside from the optimal solution and return multiple traps instead of one. By doing this you generally produce a bit worse answers, but the gain you get from "bad" tests is worth of it. I would say, that the number of traps you return is solely the game between you and other contestants.

Don't even try to find me in the scoreboard :)]]>

My first submission three days before end (when I noticed there is another marathon ;) ), which placed me in the 5th place was:

- sum all observations for each place and put single cage in the place with the most fish (take the leftmost one if there are multiple possibilities)

I really did not expect much from it and it took me by surprise.

Then I did simulations for widths 20 and 25, where I repeatedly generated 3 days, but without rocks. I tried all the 2^20 resp 2^25 possibilities of placing cages. It turned out the best in such case (no rocks) is to put cages in the middle (and it did not make much difference how many cages one used nor if they were next to each other).

So when rocks are introduced I assumed they do not make much difference. Only rocks that completely block the place concerned me - so I never put a cage in place when there was no observation of at least one fish there.

I tried a lot of variations (on 200k tests, where score was computed against optimal solution - single cage in the place where the most fish landed after three days):

- do the same as my first submission, but use the one closest to middle if there are multiple possibilities - score raised

- try to put a cage (two/three/four/five) closest to the middle if there was at least one observed fish - worse results than my first submission

- try to put a cage (two/three/four/five) closest to the middle if there was at least X observed fish, where X is some percentage of the most fish per place (summed for all observations), preferring place with higher number of fish if the distance to middle is equal.

In the end the best was to use single cage closest to the middle with X being 83 percent of max - the interesting thing is that all bigger/lower integer percentages were monotonically worse.

I also tried to determine which of the above solutions gives the best answer based on observed parameters. But could not find anything useful. I tried neural networks and decision trees as classificators, but did not have much time to tune them.

My final solution uses one IF I found by randomly trying splits of 200k tests on random parameter and which applies to 7.5% of 200k tests and increased my local score by 0.5% ;)

My final submission (in the edit)]]>

I use quite simple model, the solution is 173 lines short.

1. Calculate expected mean and standard deviation for each given day's data. Each day has of course its own model (mean and std dev). First I used just averaging values of fish amounts over each position to calculate the expected mean. Standard deviation was just (sum of squared differences to mean for each trap position) divided by (total fish - 1). Here "-1" was the standard normalizing term for small samples.

2. For each river width position, calculate the "expected" amount of fish if there were no rocks at all. Use cumulative distribution function for Gaussian distribution. In C there is erfc function in <math.h>, the error function needed for CDF.

3. Subtract the actual amount of fish at a width position from the expected. Call it delta. The higher the delta, the more fish we can get from this location. Or am I wrong?

4. Sort all river width positions by the delta. Take the best (1 + width * width / log(width) / 500) of them.

Later improvement:

Use Truncated Normal distribution instead of just Normal distribution. Not sure I succeeded at this. My mean values are still wrong, sometimes significantly. This is very true for small fish amounts per day.

Provisional score: 484533.01

Provisional rank: 7

P.S. How many local tests did you use? I started with my usual 100 tests, but as long as solution was pretty fast and results highly volatile, switched to 1K at the later stage.]]>