Skip to content
gngdb edited this page Nov 11, 2014 · 10 revisions

Classifiers used successfully in earlier simpler challenge on very similar data:

  • random forest (3000 trees, winner)
  • glmnet
  • bagged MARS
  • gbm
  • ensemble of the above

Evaluation

The test the leaderboard is using is apparently a combined ROC AUC score as described here. Obviously, it would be useful to replicate this test to run hyper-parameter searches and test new features etc. Notes on trying to do this can be found in Mimicing leaderboard test.

Cross-validation

Cross-validation is done on each subject, and then the total predictions over every cross-validation iteration are used to calculate the overall AUC estimate. The split is designed to include one preictal hour in test, along with however many interictal hours must be included to keep the proportion the same.

After doing this and running Patient_2 repeatedly (it is problematic) was getting a variance of 0.042 with a mean of 0.484. Occasionally, a cross-validation iteration will yield an AUC of zero, which appears to be what is dragging down the average. This is still in progess, haven't got to the bottom of it yet.

Have now updated the AUC calculation to be done on a total set of prediction-class label pairs gathered over all folds of the cross-validation. Running this a ten times repeatedly to estimate its variance found that the resulting value was more robust. Before, results were:

{'Patient_1': {'mean': 0.90813884297520664, 'var': 0.013745008314854026},
 'Dog_1': {'mean': 0.54685950413223139, 'var': 0.038157735127382006},
 'Dog_4': {'mean': 0.72254545454545449, 'var': 0.030683125874733472},
 'Dog_2': {'mean': 0.92059939571669791, 'var': 0.0045562255999286871},
 'Patient_2': {'mean': 0.58685950413223142, 'var': 0.064672829724745584},
 'Dog_3': {'mean': 0.73352066115702474, 'var': 0.039981610818933137},
 'Dog_5': {'mean': 0.75555922865013769, 'var': 0.070148136329485675}}

After, results were:

For Dog_1 mean trainscore was 1.0 with sigma 0.0
For Dog_1 mean testscore was 0.48037665289256193 with sigma 0.0056620825439263
For Dog_2 mean trainscore was 1.0 with sigma 0.0
For Dog_2 mean testscore was 0.9906499681857257 with sigma 2.6412869690587782e-05
For Dog_3 mean trainscore was 1.0 with sigma 0.0
For Dog_3 mean testscore was 0.8944935950413223 with sigma 0.0021300748202393964
For Dog_4 mean trainscore was 1.0 with sigma 1.232595164407831e-33
For Dog_4 mean testscore was 0.808270039761927 with sigma 0.00238186462918859
For Dog_5 mean trainscore was 1.0 with sigma 0.0
For Dog_5 mean testscore was 0.9809316804407715 with sigma 2.7949289741897036e-05
For Patient_1 mean trainscore was 1.0 with sigma 0.0
For Patient_1 mean testscore was 0.8716895443660835 with sigma 0.0016590596476532074
For Patient_2 mean trainscore was 1.0 with sigma 0.0
For Patient_2 mean testscore was 0.4587190082644629 with sigma 0.004212024451881701

Which is a lot better.

Attempts

While trying to replicate the test described above accidentally improved our score on the leaderboard. So, tried to run some simple feature selection and train a random forest with many more estimators to improve our score. There were massive problems with generalisation to the test data they are using. Notes on this can be found in Random forest submission one.

After this have created a global classifier using mean values of scaled features as global features. Using only 3 features was able to get 0.56 as described in Mean composite classifier.

Noting that an early attempt 20 days ago got us our current place on the leaderboard. This was done using the following features:

['raw_feat_var_', 'raw_feat_pib_', 'raw_feat_corrcoef_', 'raw_feat_cov_', 'raw_feat_xcorr_']

Support Vector Machines

Recent forum post recommended not using random forests. Taking their advice tried Support Vector Machines:

=====Training Dog_1 Model=====
predicted AUC score for Dog_1: 0.55
##Writing Model: probablygood_model_for_Dog_1_using__v2_feats.model##
=====Training Dog_2 Model=====
predicted AUC score for Dog_2: 0.98
##Writing Model: probablygood_model_for_Dog_2_using__v2_feats.model##
=====Training Dog_3 Model=====
predicted AUC score for Dog_3: 0.89
##Writing Model: probablygood_model_for_Dog_3_using__v2_feats.model##
=====Training Dog_4 Model=====
predicted AUC score for Dog_4: 0.81
##Writing Model: probablygood_model_for_Dog_4_using__v2_feats.model##
=====Training Dog_5 Model=====
predicted AUC score for Dog_5: 0.97
##Writing Model: probablygood_model_for_Dog_5_using__v2_feats.model##
=====Training Patient_1 Model=====
predicted AUC score for Patient_1: 0.91
##Writing Model: probablygood_model_for_Patient_1_using__v2_feats.model##
=====Training Patient_2 Model=====
predicted AUC score for Patient_2: 0.58
##Writing Model: probablygood_model_for_Patient_2_using__v2_feats.model##
predicted AUC score over all subjects: 0.88

Moved up to 0.756.