JOIN
Get Time
forums   
Search | Watch Thread  |  My Post History  |  My Watches  |  User Settings
View: Flat (newest first)  | Threaded  | Tree
Previous Thread  |  Next Thread
The results are in... | Reply
And congratulations go to CatalinT!

http://community.topcoder.com/longcontest/stats/?module=ViewSystemTestResults&rd=16555&pm=13978&cr=40107582&sr=1

Seeing the progress I was convinced someone will hit 100%

I am so interested to see what approach was taken by the top competitors. It was really annoying for me that whenever I was attempting to "follow the book" I was receiving scores lower than my initial "weird" unconventional idea.
Re: The results are in... (response to post by WojciechMigda) | Reply
Congratulations to all winners !

I used deep convolutional neural network on spectrograms of audio recordings. Without data augmentation it scored about 31.8m. The augmentation was done by 2 methods: randomly cropping 9s intervals of the speech and linearly transforming frequency bins of spectrograms (known as Vocal tract length perturbation, described by Hinton et al. in https://www.cs.toronto.edu/~hinton/absps/perturb.pdf ). This allowed to reach 33.9m and applying some ensembling helped to get a little bit higher score. Final score was around 34m. Of course this is a very brief description of my approach. Probably I will post more detailed description.

Looking forward to hear others ;)
Re: The results are in... (response to post by Harhro94) | Reply
I did the following:

1. MFCC transformation of every sample into a timeseries of 13-value vectors, where the smallest index MFCC was replaced with cepstral energy,
2. KMeans clustering (N=3) on each timeseries which reduced each sample into a 3x13 feature vector. N=3 worked best. I ordered the cluster centers by the respective cepstral energy. Even if this ordering seems strange it worked quite well.
3. Then I did two classifications. First I applied Extra Trees from scikit-learn with 100 estimators. This alone worked quite well, but I have noticed that the top ranked languages were often ties, so even if the ground truth language was at the top it might not necessarily be picked as #1. Alternatively I have applied KNN (N=5 worked best) with preprocessing done using StandardScaler and PCA w/ whitening. This alone gave better results than Extra Trees, but I made another step mixing the two classifiers by adding both probability arrays (5 to 1 ExtraTrees vs. KNN ratio) and then selecting the top-3 languages. This brought me to the highest score.

I tried different vq/codebook approaches, but none of them worked better than the above. I tried merging in Deltas and Delta-Deltas, as well and SDC vectors, and again, none of that improved my score. I also tried cutting of cepstral coefficients corresponding to the silence between the actual utterances, but surprisingly, this only lowered the scores.

This is the github repo of a major part of my code: https://github.com/WojciechMigda/TCO-SpokenLanguages2
Re: The results are in... (response to post by WojciechMigda) | Reply
All I did was a subset of methods described in the paper by H. Li (listed in the first place in the Hints section of the problem statement). My solution consists of phone 2-gram and 3-gram modeling (only with one (English) phone recognizer), Gaussian mixture model (GMM) with shifted-delta cepstrum features, and fusion of the scores with Gaussian linear back-end.

The scores from individual methods were 3073200 (2-gram modeling), 2593520 (3-gram modeling), 3070920 (GMM with 128 Gaussians), and 3144160 (GMM with 1024 Gaussians). The fusion resulted in 3490720. Then I included some extra shift to the final score to decrease/increase the scores of the languages which received more/less than 70 wins.

I had a difficult decision during the last weekend. Before the weekend, I only had the results without the GMM with 1024 Gaussians and I could not reach 3500000. I was not sure if adding GMM with more Gaussians will increase my score sufficiently. During the Saturday, I calculated the GMM with 1024 Gaussians, but I still needed to score it on both all the test and training data to calculate the fusion. The only chance to get the results on time was to rent many Amazon instances and to pay a lot of money for it with no guarantee it will really increase the score. But I did it - I rented twenty c4.8xlarge machines and after 3 hours I get the results and it paid off :).

This was the first match I have used external libraries very intensively, much time was consumed by searching the web for the best tools and made it work. Especially CMUSphinx and Armadillo showed to be very useful. Also, finally this was the first ML marathon match I did not use random forest in! :D
Re: The results are in... (response to post by Harhro94) | Reply
Here is the more detailed description of my approach: http://yerevann.github.io/2015/10/11/spoken-language-identification-with-deep-convolutional-networks/
RSS