||Is it authorized ?
||Thanks for asking. Probably this will be allowed but please don't share your approach yet.
||Discussing your algorithms is allowed.
Note for those who will share the ideas - usually even more interesting than the approach is what you did try but didn't work !
I used a fairly straightforward 2 step solution:
1) Train a Unet based model for each city separately, on RGB images only
2) Transform the mask into the lines
1) Mask identification
- I transformed the ground truth lines into target masks, by simply drawing the lines with a constant width (10 pixels). I suspect there was a big room for improvement here by using better data since not all roads were the same width
- I found that models trained on single cities was better than one big trained on all cities
- For training I cut images of size (650, 650) and resized to (336, 336) for network input, with very simple augmentations (vertical / horizontal flips)
- small batch size = 8, due to memory constraints
- I added SpatialDropout2D (I'm using Keras) with a small rate (0.1) after each Conv layer. This worked better than BatchNormalization for me
- I used Adam with a very small learning rate -> 0.00001 and dice_coef as loss function. I couldn't make it work with binary_crossentropy (this is expected as the % of positive pixels was very small). Also, once the model trained, I took it from there and augmented the learning rate to 0.0001 (instead of lowering it, as usually) and it somehow improved the model. I've read a bit recently about the circular learning rate (kind of simulated annealing for NNs) and it's not the first time I see it working quite well. One must have patience though, since it's not obvious it will work (on the first bump, the loss value degrades significantly, only to improve more later)
2) Mask to lines
- I used probabilistic Hough transform to identify the segments in the mask
- starting from the longest ones, I kept only the ones that didn't overlap too much with the ones already kept
- merged the lines that were close enough and had similar angle
- naively move some end-points by some pixels to improve the coverage on the mask (the keep the longest first lines heuristic, caused me to quite often keep the diagonal lines on a straight line mask)
- merge points that are very close
- connect the lines that are very close (endpoints, or point to line distance)
- create the lines intersections for the lines that cross (I totally ignored the cases where some roads are above other)
- remove the "roads" that are not connected to a border. The mask often contained some small patches of false detection. As all roads are usually connected to something, this worked very well although cost some points on the cases where there was no data on some parts of the image
- remove the very small patches and merge the very close points again - this actually didn't improve the score much, but create a more visually aesthetic lines
Tried but not worked:
- fc-densnet and other architectures - no idea why. But given the difficulty I had with the learning rate, I suspect I just did something wrong here
- 8 band data - this actually seemed to work, but ... not enough time to run the models. I used 8 band + some indexes on IR / red
- bigger Dropout rates - I couldn't make it work, but I think this should work if correctly applied once the network started to converge.
- training only on patches with data - there was quite a lot of patches with no roads at all, and it penalized the network very heavily for predicting even a single pixel (given that I used a 'smoothed' version of dice coefficient, with a small 'smoothing' value = 1). It may make sense to use a bigger smoothing factor
- training with a loss function of combined dice coefficient and binary crossentropy. I also tried to fine tune a network trained on dice only and it showed some promise
- refiner network - this is the idea from this paper (http://yann.lecun.com/exdb/publis/pdf/alvarez-eccv-12.pdf). I took the mask predicted by base network as input and it showed very good promise, but unfortunately I had no time to make it fully work. It basically learn quite quickly the fact that roads should be connected and small patches are irrelevant and basically fixes those points. Very handy. I suspect that it should also work very well as a prediction merger -> as input it could take mask predictions from different networks (each prediction mask = 1 channel in input). I will definitely try that (if somebody's done that I would be very interested in results)
- I didn't do much to better manage the borders. And borders were extremely important in this competition.
- I found it difficult to score the intermediary steps accurately - it's always handy to have a full pipeline to optimize. I ended up doing this:
- the mask quality - take lines from ground truth and keep only those that have a coverage ratio with the mask > threshold. Calculate the full score
- line finding quality - run the algo on the mask generated from ground truth data
I found this competition really interesting - it's worth noting that most of the recent TC commercial marathons were very high quality, both in the data and topic aspects. Thank you guys !
It was also visually satisfying to watch the roads the solutions can detect !
When looking on my submission now, I notice that my solution fails miserably on Paris images which are not really in Paris (the ones that looks like some roads hidden by trees). In my local testing, the validation I used to test there was unfortunately very few of those and I couldn't understand the big difference I have from the leader-board. This is quite ironic, since I live in ... Paris suburbs :)
@walrus71 - can we also share some images ?
||Thanks for the description! I'm glad that you found the contest interesting and of high quality.
Yes, you can share images.
||Instead of buffering the linestring with a constant value, i tried to use the number of lanes(in the geojson file)
to adjust for the width of the road. So one lane is buffered by 5 and two lanes by 10 and so on. Seemed like a good idea, given residential area have narrower road width (less lane) than highways but this produces overlaps in highways and similar result in validation making it difficult to separate the highways by the dividers, which i found to be particularly challenging in this dataset. So in the end i had to use the constant buffer method.
I used unet, single model mostly.
I am trying to compare the results from my model to openstreet map data, for my city.
Could someone tell me the specific algorithm used in this competition to convert MUL To MUL-PanSharpen..i.e
Brovey, weighted Brovey ..?. I have Pan and Mul files, i need to sharpen the Multi-spectral bands.
||My overall approach was very similar to Mloody2000's:
* Train separate models per city, originally I found it was higher performance than one big one but in the last couple days I only saw very small performance improvement
* Input: random 256x256 sections of the 1300x1300 images (9 channels, 8 from MUL-PanSharpen and 1 from PAN), with random 0/90/180/270 degree rotation
* Output: draw lines with constant width, 128x128 (half the input resolution)
* Model: U-net like structure, 128 channels at top layer and 512 at bottom
* Learning rate: originally I just use 0.0001 with ADAM, I got a huge improvement gain in the last few days when I start at 0.0001 but decrease to 0.00002 after several epochs, then decrease further after more epochs
* Loss: softmax cross-entropy
* Validation set: I used 80 of 2780 images as validation, and saved the model that had best loss over many 256x256 sections from the validation set
* Inference: used the full 1300x1300 resolution
* Ensemble: for each city, I trained four models, and then averaged the segmentation outputs (I also tried AdaBoost but it was worse)
* First apply a Gaussian blur with sigma=1 over the segmentation output
* Then threshold to convert this to a binary mask (determine binary threshold by optimizing on validation set)
* Apply morphological thinning (skimage.morphology.thin) so that we get single-pixel-width lines
* At this point we basically have a graph, where every set pixel is a vertex, and there are edges between adjacent set pixels
* But running Douglas-Peucker will reduce the number of vertices without changing the road network
* One optimization that helped: before morphological thinning, pad the image with several copies of its border on all sides; this way roads are more likely to connect to the border after thinning
* Remove small connected components
* Remove short segments that have a dead-end (i.e., at least one endpoint of the edge has only one incident edge)
* If a road is close to the border, then connect it to the border
* If a road is close to another road, then connect it; I implemented this by trying to extend dead-end segments by a certain amount and see if they intersect another segment
||Congrats to the winners.
Wondering if any of the top 4 guys is willing to share their approach. There is a pattern in their scores, something they did for such large margin (mainly in the last 1-2 days submissions). Rest of the guys (including myself) would have used pretty standard models.