JOIN
Get Time
forums   
Search | Watch Thread  |  My Post History  |  My Watches  |  User Settings
View: Flat (newest first)  | Threaded  | Tree
Previous Thread  |  Next Thread
Restrictions issue | Reply
It seems to me that K80*1w restriction is too narrow to fit in. In the modern world of AI with marching Volta architecture (even the provided benchmark works on 4xV100) using that old one as a limit is like using abacus instead of IBM PC.

For example if one wants to use anything other than default pretrained models that must be trained on ImageNet first, he can't do that because even something simple like se-resnext-50 consumes 2 gpu-week.

Is there any chance that the restrictions will have been loosened or left only for test time before the deadline will come?
Re: Restrictions issue (response to post by nizhib) | Reply
This is a valid concern and we are considering options how to solve it. Will be back with an answer soon.
Re: Restrictions issue (response to post by walrus71) | Reply
If you decide to do it, please do it asap... as it will change our training strategy. Thanks.
Re: Restrictions issue (response to post by nizhib) | Reply
After discussing this with the stakeholders of the contest we are considering switching to a more powerful architecture, unless contestants object to this move, claiming that they have already invested too much effort in optimizing to the original setup. So if you have an opinion in this question, please let us know by answering this post whether you are for or against changing to stronger test machines.

One more question, in case you agree to the change, what architectures you prefer and why? The options we are considering are g3.8xlarge, g3.16xlarge and p2.8xlarge. Of course price is an important factor in the decision, but you can also weigh in with your opinions.
Re: Restrictions issue (response to post by walrus71) | Reply
I would vote for p2.8xlarge
Thanks
Re: Restrictions issue (response to post by walrus71) | Reply
I would vote for g3.8xlarge
Re: Restrictions issue (response to post by walrus71) | Reply
p2.8xlarge
Re: Restrictions issue (response to post by walrus71) | Reply
What about p3.2xlarge? It's really ~7 times faster p2.xlarge
Re: Restrictions issue (response to post by cannab) | Reply
I would like to add a point regarding the storage device.
The speed of storage device attached in the recent progress prize offer doesn't even meet the 5400 rpm standard I guess, its too slow.

So depending on the requirement good SSD option should be provided, it would not add substantial cost compared to GPU options.

Regarding GPU options, given that there would be an upper limit to the cost/budget, it should be flexible under the budget. It would save costs to the host because everybody may not fully consume top level cluster.
Re: Restrictions issue (response to post by cannab) | Reply
Sorry for the delay on this we are still trying to secure the budget but here is the latest update. We will definitely be able to support g3.8xlarge and we might be able to support g3.16xlarge and should know for sure tomorrow. It is unlikely that we will be able to support p2.8xlarge or ssd.
Re: Restrictions issue (response to post by kbowerma) | Reply
AWESOME. Thanks for the reply!
Re: Restrictions issue (response to post by kbowerma) | Reply
Will the g3.16xlarge be supported for the final testing?
Re: Restrictions issue (response to post by kbowerma) | Reply
The g3.8xlarge instance use a single m60 gpu is still too poor.(4.85*2 tflops single-precision floating, fix me if i'm wrong)
The client use a much more powerful machine(4*P100 = 4*10.6 tflops single-precision floating, 4 times faster than the g3.8xlarge instance) to train the baseline solution got only 722k scores.
As verify the correctness of the training script is only apply to very top contestants. Using a much more powerful machines when verifying training scripts does not add too much fee.
It may not be reasonable to request at this time, but I still request to apply the time restriction only apply to test phase, otherwise it will sacrifice the accuracy.
Re: Restrictions issue (response to post by gonzalb) | Reply
Yes, the g3.16xlarge will be the largest size we can support.


Thanks for you patience.
RSS