Get Time
Search | Watch Thread  |  My Post History  |  My Watches  |  User Settings
View: Flat (newest first)  | Threaded  | Tree
Previous Thread  |  Next Thread
Post your solution | Reply
I wonder what you Topcoders used in this match?

I use quite simple model, the solution is 173 lines short.

1. Calculate expected mean and standard deviation for each given day's data. Each day has of course its own model (mean and std dev). First I used just averaging values of fish amounts over each position to calculate the expected mean. Standard deviation was just (sum of squared differences to mean for each trap position) divided by (total fish - 1). Here "-1" was the standard normalizing term for small samples.
2. For each river width position, calculate the "expected" amount of fish if there were no rocks at all. Use cumulative distribution function for Gaussian distribution. In C there is erfc function in <math.h>, the error function needed for CDF.
3. Subtract the actual amount of fish at a width position from the expected. Call it delta. The higher the delta, the more fish we can get from this location. Or am I wrong?
4. Sort all river width positions by the delta. Take the best (1 + width * width / log(width) / 500) of them.

Later improvement:
Use Truncated Normal distribution instead of just Normal distribution. Not sure I succeeded at this. My mean values are still wrong, sometimes significantly. This is very true for small fish amounts per day.

Provisional score: 484533.01
Provisional rank: 7

P.S. How many local tests did you use? I started with my usual 100 tests, but as long as solution was pretty fast and results highly volatile, switched to 1K at the later stage.
Re: Post your solution (response to post by gorbunov) | Reply
I used 200k tests locally ;), because my solutions were trivial and I recreated the tester in C++, using inputs generated by supplied visualiser saved on disk. Testing 8 solutions in parallel took around 20 minutes on 2013 MacBook Air.

My first submission three days before end (when I noticed there is another marathon ;) ), which placed me in the 5th place was:
- sum all observations for each place and put single cage in the place with the most fish (take the leftmost one if there are multiple possibilities)
I really did not expect much from it and it took me by surprise.

Then I did simulations for widths 20 and 25, where I repeatedly generated 3 days, but without rocks. I tried all the 2^20 resp 2^25 possibilities of placing cages. It turned out the best in such case (no rocks) is to put cages in the middle (and it did not make much difference how many cages one used nor if they were next to each other).
So when rocks are introduced I assumed they do not make much difference. Only rocks that completely block the place concerned me - so I never put a cage in place when there was no observation of at least one fish there.

I tried a lot of variations (on 200k tests, where score was computed against optimal solution - single cage in the place where the most fish landed after three days):
- do the same as my first submission, but use the one closest to middle if there are multiple possibilities - score raised
- try to put a cage (two/three/four/five) closest to the middle if there was at least one observed fish - worse results than my first submission
- try to put a cage (two/three/four/five) closest to the middle if there was at least X observed fish, where X is some percentage of the most fish per place (summed for all observations), preferring place with higher number of fish if the distance to middle is equal.

In the end the best was to use single cage closest to the middle with X being 83 percent of max - the interesting thing is that all bigger/lower integer percentages were monotonically worse.

I also tried to determine which of the above solutions gives the best answer based on observed parameters. But could not find anything useful. I tried neural networks and decision trees as classificators, but did not have much time to tune them.

My final solution uses one IF I found by randomly trying splits of 200k tests on random parameter and which applies to 7.5% of 200k tests and increased my local score by 0.5% ;)

My final submission (in the edit)
Re: Post your solution (response to post by gorbunov) | Reply
The funny thing about this problem is that the best solution should depend on other solutions. If you want to maximize the main metric of effectiveness of your traps, you should place only one trap - the one with maximum probability of getting random fish that is distributed as described in statement. That one is easy to find if you know the map. So using the observation data we can try to restore the map as close as possible. But the issue isn't here...

Your overall score has nothing to do with that. The overall score depends on the raw scores that have been found by other contestants very much. This is pretty obvious statement, isn't? But what I'm talking about is that if you want to submit the "best" in terms of trap effectiveness solution that you have in mind, you are thinking about creating 200 additional accounts that are going to produce together the best answer on each test case. Otherwise your "best" solution won't work. Otherwise you have to rely on what other contestants managed to achieve.

So you keep in mind that everyone is maximizing his own probability to return the optimal solution, which leads to the situation where some tests are not covered with high raw score. It makes sense to step aside from the optimal solution and return multiple traps instead of one. By doing this you generally produce a bit worse answers, but the gain you get from "bad" tests is worth of it. I would say, that the number of traps you return is solely the game between you and other contestants.

Don't even try to find me in the scoreboard :)
Re: Post your solution (response to post by gorbunov) | Reply
Hi there,

I am so sorry I have not posted my approach for a long time although I won the match.

I used Random Forest. However, training data was not enough in this problem, so I generated the data on my own.

1) I generated 200 pairs of location of rocks and fish as the tester did on run-time.
2) The location of rocks was used for generating expectation of fallen fish. We knew fish's distribution at the top. We also knew how fish moved after hitting a rock, 50% left and 50% right. By using them, I applied DP to calculate the expectation at the bottom. The expectation was used as ground truth.
3) The location of fallen fish was used for generating the following features.
- X, the location
- W, the height of river
- L, the length of river
- The number of all fish
Furthermore, for each neighbor Y (X-3 <= Y <= X+3),
- The total number of fallen fish at Y
- The total ratio of fallen fish at Y
- The difference between the total number of fallen fish at Y and at X
4) Trained the random forest for 50 seconds by 2) and 3). The number of trees was between 20 and 150.
5) Predicted the number of fishes for all locations. I set up one fish trap at the highest expectation location.