||That's an interesting approach to finding the polygon. That's an area I probably should have concentrated on more, as I just use a pattern I thought up the first day. I fail to find the polygon at all in 22/30000 cases.
I wrote a one-off tool to optimize the parameter values. I first wrote a general multiple linear regression routine, but it was minimizing delta^2 while I wanted to minimize delta.
||My basic algorithm is the same as jdmetz's.
I was actually thinking this is what most people in the top 20 were doing, so I am surprised to see so many convex-hull based algorithms.
Of course my basic implementation of this only scored ~2475, so I'll briefly describe some optimizations.
1) When binary searching 'beyond' a side I would aim towards a probable vertex (the intersection of neighboring sides when viewed as lines).
2) After enough checking it becomes possible to estimate the maximum-possible convex polygon, by summing the maximum possible 'cap' outside of each side that hasn't been examined. So I modified my algorithm to return (maximum_possible + current_estimate) / 2
3) In my last submission I made a modification that would decrease the number of steps allowed in binary searches as M got smaller IF the error term described in 2) was large (>0.01).
Overall my average difference is about 0.022 over 8000 test-cases.
That is an average score of 4.978. Personally I think there is a case in the 500 systests that my solution is very lucky on, which gives me an extra ~0.5-1 points. But we will see...
||First: try "random" points until two "in" points have been found. (not completely random: the first point is always (5.0,5.0) and the next few are also hand-crafted; also, once one point has been found, look closer to that point instead of anywhere in the square: take ((x,y)+(5,5))/2 instead of (x,y)).
Next: try to minimize the "uncharted area".
From known "ins" and "outs" we can construct two polygons: "known inside" and "known outside". Between them is a ring of uncharted land.
Select about 100 random points from that small ring (this is a bit tricky and relies on triangulating the uncharted land). For each of these investigate the change of unknown area if the test should return in or out. Select the random point that maximizes min("in" growth if "inside", "out" shrinking if "outside"). I assume that replacing min(x,y) with xy/(x+y) might have been slightly better, but that was a post-mortem idea.
Finally return a value between known insode area and known outside area -- not the arithmetic mean but rather an extrapolation under the assumption that the uncharted area has the same proportion in:out as the full square (i.e. return all*in/(in+(all-out)) )
This contest, I had to fight a lot with awful bugs (sign errors, off by one as in ">0" vs. ">=0", getting the orientation right, mixing up entries when producing the inverse of a 2x2 det 1 matrix, etc.).
This was partly caused by my ambition to avoid rounding errors and
testing only grid points of a 65536 x 65536 grid (actually I intended to go up to (1<<20) x (1<<20)), which allowed me to make a lot of 32bit vs. 64bit errors as well. In fact, my last submission (deadline minus 4 hours) was the *first* to get everything straight under the new paradigm I tried to implement since saturday. If I had had the framework running earlier, I would have had time to implement all the stuff I had almost ready, e.g.:
This was so frustrating - normally I rather suffer from running out of ideas mid-contest...
- an "opening book" for up to 10 moves (depening on situation) based on analysis of 100000 test cases I had prepared.
- trying to fit 5,6,7,...,20 lines into the uncharted area and thus determine a good approximation of the "true" polygon
EDIT: After reading the full thread it looks like I was essentially on the venco trail and had the gsais idea of an opening library.
||As gsais I had little time (mostly after hours so I'm exhausted), and didn't finish my approach, mainly by the thousands of bugs I entered in my code. In the end I've submitted a solution similar to venco's (keep inside and outside points and choose a point that reduces the unknown area) but my choice of the point was not adecuate. After exhausting the querys I produced 500 convex polygons that fitted in the unknown area and returned the mean area. I expected this to behave much better than it finally did, so, when there was a large number of trials, I simply returned the area of one of them but constructed in a more restrictive way.
The polygons were constructed in the following way: 1) pick one main outside point, 2) look for the nearest point in the perimeter of the inside polygon and pick a point in the segment joining them 3) compute two maximal lines through it (ie if the point is in the real polygon the line segment through it will be within the two), 4) look forward and backward for other main outside points reducing the angles between the lines. 5) pick randomly one of the resulting lines and save it. Look for a main outside point not covered in the previous steps and repeat the process with it. Finally construct the proposed polygon using the picked lines and (if needed to complete it) the sides of the square.
I never finished my main idea, which was to construct a convex polygon as before in every trial, pick the edge the polygon constructed with more variability and choose the point which reduces more the variability of that edge then start again, I wonder how it would have worked, but it was too complicated.
||Seems like nothing new:
1) As gsais said - generate polygons and choose a point which is included only in 50% of polygons
2) As venco choose a point with maximizes the new discovered region (min of in- and out-regions).
3) Return average of two regions
I wonder why no one ever talks about how much time did it take to him for creating solutions and how their solution evolved during time - at least I'm always interested in that (or maybe I should start a new topic?).
Anyway, I'll start with that :
Almost right away, I thought about computing the "outer-region", so my first solution used random point in "unknown-region" and updated the convex hull and outer one. Since I didn't want to spent too much time, I went for java with java.awt.geom. This took 3-4 hours for 2430 score.
Then I replaced random sampling with maximizing (d1 * d2) where d1 and d2 are distances to inner and outer-regions (2448) and then replacing it with (2) for 2475 - about 6-7 hours.
The last thing was to include (1). While it decreased the score in general (maybe I have some problems with generating polygons), but it got rid off most cases where I had bad score (<4.5). for a total work of about 9 hours.
In the end the worst thing was the efficiency (especially java.awt.geom). For (1) I generated only 20 polygons at a step, and usually 6-7 steps at max. And for the (2) with random sampling I used at most 60 samples depending on the time left - quite often I didn't used all of "isInside calls", because I ran out of time.
I really thought that there will be a much more java users.
And for gsais: how did you "mutate" your polygons to match new constraints?
In fact, my last submission (deadline minus 4 hours) was the *first* to get everything straight under the new paradigm I tried to implement since saturday
At least you have achieved that, my only submission is still full of bugs, it hanges from time to time ( going round and round the square without finding one point of intersection with another line for example), or cleaning completely the inside polygon leaving only a segment as the convex hull ...
||Somewhat similar to venco.
My initial outer polygon consists of the four corners. To start, I find a point in the polygon, starting at a random (+/-0.2) point near the middle, then near the middle of the half squares...
Next I use binary search between that point and the four nearest outer points to form an initial inner polygon. Only the convex hull of the inner points was remembered after that.
Outer points hidden behind others were removed. I added extra points between the remaining outside points, which were the furthest position that the polygon could be. This gave a star shaped outer polygon, except where limited to the edge of the board.
I chose the next test point as from the centroids of triangles formed between consecutive pairs of inner points, and various points on the outer star. The point that maximised the potential gain in size of the inner polygon was selected.
The return was the squared mean root of the areas of the inner and outer star polygons, except in exceptional cases where emergency values were used.
A lot of the maths was new to me. And I had many bugs - eg my PointInTriangle() function that returned true if the triangle was a straight line! The problem seemed like it belonged in algorithm match, eg if the goal was to be accurate to 1e-9 in <2 seconds, I think we could have done it. Scary, but fun in the end.
One last thing, I wrote a little SVG library to help visualise the state of my solution. It was invaluable.
||My approach is an adaptive triangulation, inspired by adaptive finite element algorithms. Considering that I did not take into account the convexity, I think it works quite well.
The algorithm starts with a hand-picked initial triangulation which is a very simple one. Then in each step:
* Each triangle is assigned an error, depending on the area and on how many of its vertices are inside the polygon.
* Then these error are distributed among its edges, depending on the edge length.
* Edges with biggest errors are marked for refinement so that the marked edges constitute some fixed percentage (I chose 20%) of the total error.
* Marked edges are refined: The edge is divided in half and so are its neighboring triangles.
* For the newly created vertices, inquire if they inside the polygon or not. If a new vertex is on the boundary of (0,10)x(0,10) then automatically it is outside the polygon (to save the question for other 'important' vertices).
After the loop, the area is calculated by appropriately dividing the 'boundary' triangles.
I tried to implement an approach like venco used, but my coding efficiency did not allow me to finish in time. In this, the errors on edges would depend on if a part of the edge can 'see' the convex hull of the interior points through the net of exterior points.
||I started with the most obvious solution to me: a grid of 10x10 points (~1650 points) and continued with the small optimization of extending the grid to sqrt(M) x sqrt(M) (~2150 points).
Because I started with a grid, I tried to fit all the solutions on a grid. I thought about going 'analogue' but I didn't have the time.
The next improvement was to explore the space going down a quad tree of increasing depth. I also tried to extrapolate unknown cells from the existing ones, implementing simple grid rules like "all the points between 2 inside points are also inside". This simple approach was quickly discovering the boundary cells and divided them recursively. ~2430 points.
After some optimizations (like trying to extend horizontal and vertical edges by doing a binary search on a parallel line, a panic routine for find very small polygons or a special bfs search made also for small polygons that searched first nearby the existing inside points) that got me to ~2445, I felt that this is a dead end and moved on.
Moved on to the grid convex-hull approach, pretty much described in my previous post.
I started from an known inside point, found 4 edge-points by bs to the left, up, down and top and then for each edge of the convex hull of the inside points (the convex hull is updated automatically on each query) I tried to expand its midpoint along the normal using of course, binary search.
This scored less than my previous quad-tree approach, until I implemented the computation of outside regions determined by an outside point and the inside convex hull. ~2460p.
But the midpoint-expansion left a lot of queries left, so I spent then by selecting unknown points, first near the convex hull vertices and next at random near the hull edges. ~2475p
The last improvement was to select at each step the point that minimized the number of unknown points left, just like venco, Psyho or hagman. I used the same function to maximize: min("in" growth if "inside", "out" shrinking if "outside"). That got me to 2484~points but also made my solution very slow because at first there are a lot of unknown points (especially on large grids) and I compute the function described above for every one, at each step. That made testing extremely slow and left little time for improvements.
||I wrote an svg library too, but I didn't find a viewer that I was very happy with. In the end I used rsvg-view which comes with librsvg. What svg viewer did you use? Here's one of mine: example.svg showing inside (green) and outside (red) points, the true polygon (black), hull sides through time (green), and line segment's I'm binsearching along (gray).
||That is an average score of 4.978. Personally I think there is a case in the 500 systests that my solution is very lucky on, which gives me an extra ~0.5-1 points. But we will see...
What do you mean? If your average is 4.978, then the most you can benefit from a lucky case is 0.022.
||how did you "mutate" your polygons to match new constraints?
I mutate the polygons by moving their vertices (but the "source" polygon already satisfies the constraints). Basic pseudo-code:
while (population.size < 1000)
src = pick a random polygon in population // src already satisfies the constraints
tgt = src.clone()
for each vertex v in tgt do // in random order
repeat up to k times
move vertex v to a random near location
if p still satisfies the constraints then break loop
if tgt != src then add tgt to population
The problem with this technique is that each mutation reduces the real randomness of the total population, which is vital for estimating the probability of a point being inside or not.
||Perhaps this mutation operator combined with a genetic-algorithm-like crossover function to "breed" two polygon representations might have helped maintain population diversity.
||I used firefox. Your svg didn't work in firefox until I changed the xmlns attribute to xmlns="http://www.w3.org/2000/svg".
Here is the final svg produced from my solution on the visualiser startup test case. Inner is green, outer (tested points only) is red, star is yellow, black is closest point between inner and star (not used in final solution), blue dot is a reference point inside the inner polygon.
||Oh, I meant 0.5-1 pts in my provisional score, not in my average. Sorry for the confusion.