
All I did was a subset of methods described in the paper by H. Li (listed in the first place in the Hints section of the problem statement). My solution consists of phone 2gram and 3gram modeling (only with one (English) phone recognizer), Gaussian mixture model (GMM) with shifteddelta cepstrum features, and fusion of the scores with Gaussian linear backend.
The scores from individual methods were 3073200 (2gram modeling), 2593520 (3gram modeling), 3070920 (GMM with 128 Gaussians), and 3144160 (GMM with 1024 Gaussians). The fusion resulted in 3490720. Then I included some extra shift to the final score to decrease/increase the scores of the languages which received more/less than 70 wins.
I had a difficult decision during the last weekend. Before the weekend, I only had the results without the GMM with 1024 Gaussians and I could not reach 3500000. I was not sure if adding GMM with more Gaussians will increase my score sufficiently. During the Saturday, I calculated the GMM with 1024 Gaussians, but I still needed to score it on both all the test and training data to calculate the fusion. The only chance to get the results on time was to rent many Amazon instances and to pay a lot of money for it with no guarantee it will really increase the score. But I did it  I rented twenty c4.8xlarge machines and after 3 hours I get the results and it paid off :).
This was the first match I have used external libraries very intensively, much time was consumed by searching the web for the best tools and made it work. Especially CMUSphinx and Armadillo showed to be very useful. Also, finally this was the first ML marathon match I did not use random forest in! :D 