-
Notifications
You must be signed in to change notification settings - Fork 0
Submission history
Suspect apriori that this should be a very useful feature. Found exceptionally high separation on iso-plots. Expected ICA to outperform CSP, and both to outperform RAW. Found CSP outperformed ICA (though very narrowly)!
mvar_raw
Public AUROC: 0.68743
Predicted AUROC: 0.76961
Expected values:
Dog_1 Dog_2 Dog_3 Dog_4 Dog_5 Patient_1 Patient_2 Overall
0.450020833333 0.777798564477 0.769036265432 0.753514200268 0.881382632633 0.814868731309 0.360648148148 0.769611833552
mvar_ica
Public AUROC: 0.74684
Predicted AUROC: 0.83023
0.476815972222 0.953311776156 0.809512731481 0.779338422456 0.929014014014 0.869317201518 0.425092592593 0.830230271653
mvar_csp
Public AUROC: 0.74837
Predicted AUROC: 0.81972
0.426253472222 0.919725826918 0.825064236111 0.776440593067 0.930520520521 0.853518907563 0.432222222222 0.819721454993
emvar_ica
Public AUROC: 0.74074
Predicted AUROC: 0.85650
0.557541666667 0.957400035901 0.871861111111 0.750087083459 0.965962962963 0.895380813144 0.595833333333 0.856497773628
emvar_csp
Public AUROC: 0.75006
Predicted AUROC: 0.84330
0.532333333333 0.954033433758 0.830027777778 0.740407731803 0.941833333333 0.877332219437 0.6125 0.843300878237
emvar_cspdr
Public AUROC: 0.71194
Predicted AUROC: 0.607055555556 0.926507223807 0.852361111111 0.683154132689 0.96625 0.846177944862 0.552222222222 0.847620757582
emvar_cspdr,ica
Public AUROC: 0.70129
Predicted AUROC: 0.84603
0.591930555556 0.947151957074 0.822472222222 0.673227121715 0.97087037037 0.882466583124 0.514444444444 0.84602997907
Thought that combining the features used in our previous best classifier along with those we used for our current best submission would probably work well. The following features were used:
"FEATURES": ["cln,csp,dwn_feat_lmom-3_",
"cln,ica,dwn_feat_xcorr-ypeak_",
"cln,csp,dwn_feat_pib_ratioBB_",
"cln,ica,dwn_feat_mvar-GPDC_",
"cln,ica,dwn_feat_PSDlogfcorrcoef_",
"cln,ica,dwn_feat_pwling1_",
"raw_feat_corrcoef_",
"raw_feat_cov_",
"raw_feat_pib_",
"raw_feat_var_",
"raw_feat_xcorr_"],
And the predicted performance was:
predicted AUC score for Dog_1: 0.53
predicted AUC score for Dog_2: 0.95
predicted AUC score for Dog_3: 0.82
predicted AUC score for Dog_4: 0.77
predicted AUC score for Dog_5: 0.94
predicted AUC score for Patient_1: 0.77
predicted AUC score for Patient_2: 0.45
predicted AUC score over all subjects: 0.83
Then, submitted and got 0.76012.
Only slightly worse, probably relies on the features for the current best and these raw features aren't useful.
Thought this was what I was doing, but looking at the code I wasn't actually including any feature selection:
Using a simple variance threshold and then also filtering by f-scores. Doing both predicted AUC was 0.86, but I had been fiddling with the cross-val code so that won't map onto other results. Full AUC results can be found here.
Submitted and got 0.77171, moving up the leaderboard 5 places.
Now running with just the variance threshold to see what its contribution is on its own.
Also got a predicted AUC of 0.86. Submitted and got 0.77171, exactly the same, which doesn't make a lot of sense. Going to check that it actually took out the f1-score selector.
Reran it with the VarianceThreshold actually enabled and got the following results:
probablygood.gavin 0.589681818182 0.98621623775 0.890074380165 0.772748201247 0.979468319559 0.87131210293 0.537975206612 0.863925818882
Submitted and scored 0.77171 again.
Looking through commit logs to see if I did actually enable variance threshold in the last submission. Looks like I didn't have it enabled, unless it was enabled between commits.
Enabled f1-score SelectKBest with defaults, which are pretty aggressive, removes everything but the 10 best features. Training scores were the following:
f1selectionpg_gavin 0.596227272727 0.955673747151 0.622037190083 0.700814745942 0.740404958678 0.874980003656 0.59132231405 0.779228904792
Switched form KBest to Percentile 10%:
f1selectionpg_gavin 0.594309917355 0.993204113356 0.875190082645 0.807148622483 0.947581267218 0.913800047991 0.537231404959 0.865991069517
So this should in theory perform better on the leaderboard. Submitted and got 0.76693.
Then decided to try random forest feature selection. Now implemented in settings.
Used 1000 estimators and got the following training results:
forestselection_gavin 0.607252066116 0.991547999016 0.885080578512 0.769613255618 0.978782369146 0.910679190091 0.588863636364 0.86575473285
Submitted and got 0.78329, moving up the leaderboard 9 positions. Appears the improved performance on Patient_2 and Dog_1 make a difference.
Investigating which modtyp is best for the single channel time domain statistics. Intuitively, CSP should be best. We need to check this against the leaderboard because CSP takes knowledge of all features into account.
singlech_timestats_raw
Public AUROC: 0.69832
Predicted AUROC: 0.73461
Dog_1 Dog_2 Dog_3 Dog_4 Dog_5 Patient_1 Patient_2 Overall
0.479159722222 0.779531588137 0.734937885802 0.687944059975 0.781799299299 0.74687284334 0.551018518519 0.734610190333
singlech_timestats_ica
Public AUROC: 0.68967
Predicted AUROC: 0.71428
0.494840277778 0.770932191196 0.677896219136 0.678842378697 0.772084584585 0.666515700483 0.555925925926 0.714281427218
singlech_timestats_csp
Public AUROC: 0.70466
Predicted AUROC: 0.72564
0.506454861111 0.781471208263 0.72861246142 0.665038026046 0.803454704705 0.578890614217 0.578240740741 0.725640056988
We predicted raw > csp > ica We found on leaderboard csp > raw > ica Only narrow margins between them though.
Ran the batch train and predict script on all single features. Sorted the list by overall ROC prediction.
A lot of the top features are MVAR flavours, so I just used the best overall and no others.
NB: these are 20-times-CV predictions, not 10.
SVC_ica_mvar-arf
Public AUROC: 0.68460
Predicted AUROC: 0.8427
Dog_1 Dog_2 Dog_3 Dog_4 Dog_5 Patnt1 Patnt2 Overall
0.4651 0.9571 0.8439 0.7830 0.9539 0.7216 0.3756 0.8427
SVC_csp_coher_logf
Public AUROC: 0.75269
Predicted AUROC: 0.8301
0.5468 0.9579 0.8264 0.7656 0.9296 0.8225 0.5712 0.8301
SVC_ica_phase-high_gamma-sync
Public AUROC: 0.68427
Predicted AUROC: 0.8247
0.6218 0.9789 0.7523 0.7990 0.9082 0.8884 0.5332 0.8247
SVC_ica_pib_ratioBB
Public AUROC: 0.77110
Predicted AUROC: 0.8012
0.6928 0.8832 0.7556 0.7968 0.8065 0.8332 0.5598 0.8012
MVAR-ARF is supposed to be better than GPDC according to CV, but is not on the public leaderboard (see top of this page). Not sure how we should pick the best of the MVARs.
ica_phase-high_gamma-sync does reasonably well with CV/public = 0.8247/0.68427
csp_coher_logf is surprisingly well with CV/public = 0.8301/0.75269
ica_pib_ratioBB does incredibly well on the public leaderbord, with 0.8012/0.77110. This is basically as good as the best current submission, which is Gavin's probablygood with automatic dropping of worst elements.
At the moment, it seems like the public score is most correlated with the Patient_2 score. This might be because it is the worst performing subject. We probably overestimate the number of Patient_2 preictals. Might be we could improve the overall score with an improvement to the Patient_2 prior. Open to suggestions on why the worst subject would be linked to overall performance, and open to other suggestions to the relationship between prediction and public leaderboard.
Made up of the features which have highest cross-validation for each subject (approximately). Includes the 4 above which were the best overall.
Prediction is with 10 CV cross folds.
Also tested this with/without pseudodata, and it did worse without pseudodata.
bestbysubj
Public AUROC: 0.74468
Predicted AUROC: 0.84500
Expected values:
Dog_1 Dog_2 Dog_3 Dog_4 Dog_5 Patient_1 Patient_2 Overall
0.474360020661 0.970092652542 0.852643652433 0.778758206978 0.972751098206 0.741838942507 0.475785123967 0.845005605453
bestbysubject, no pseudo data
Public AUROC: 0.72625
Predicted AUROC: 0.83970
0.460887152778 0.961280343093 0.841602430556 0.765463976091 0.971615365365 0.761653151599 0.491018518519 0.839706036032
"cln,ica,dwn_feat_mvar-ARF_",
"cln,csp,dwn_feat_coher_logf_",
"cln,ica,dwn_feat_phase-high_gamma-sync_",
"cln,ica,dwn_feat_pib_ratioBB_",
"cln,raw,dwn_feat_PSDlogfcorrcoef_",
"cln,csp,dwn_feat_psd_logf_",
"cln,raw,dwn_feat_spearman_",
"cln,csp,dwn_feat_corrcoefeig_",
"cln,ica,dwn_feat_ampcorrcoef-theta-eig_",
"cln,raw,dwn_feat_pwling4_"
Does worse than SVC_ica_pib_ratioBB and SVC_csp_coher_logf, despite including these. Clearly some kind of penalty for including extra (noisier) elements when doing the classification. Should submit again with Gavin's f1-score and gaussian thresholding
Does better than SVC_ica_phase-high_gamma-sync alone, despite having worse Dog_1 and Patient_2 predicted scores, which goes against my suggestion that Patient_2 is weighing more heavily than it should for leaderboard scores.
Since Patient_2 is doing badly and I cleaned and downsampled it, I wanted to check it is not just because of the cleaning that it is worse.
Could have just looked at CV scores, but we had spare submission slots.
I selected the features which were used in Individual features which are supposed to be good overall, but only two of those are available dirty (no dirty phase, ampcorr or mvar processed).
Some of these came out worse after cleaning, so it seems whilst cleaning cleaning out the line noise in Patient_1 improved its scores, cleaning out non-existent line noise for Patient_2 for the purpose of "having a consistent model and not switching on subject names" has made it worse.
I think it might be best if I re-do all of Patient_2 without line noise removal... :(
SVC_csp_coher_logf CLEAN
Public AUROC: 0.75269
Predicted AUROC: 0.8301
0.5468 0.9579 0.8264 0.7656 0.9296 0.8225 0.5712 0.8301
SVC_dirtycsp_coher_logf DIRTY
Public AUROC: 0.75400
Predicted AUROC: 0.8304
0.5468 0.9579 0.8264 0.7656 0.9296 0.8280 0.6394 0.8304
SVC_ica_pib_ratioBB CLEAN
Public AUROC: 0.77110
Predicted AUROC: 0.8012
0.6928 0.8832 0.7556 0.7968 0.8065 0.8332 0.5598 0.8012
SVC_dirtyica_pib_ratioBB DIRTY
Public AUROC: 0.77161
Predicted AUROC: 0.8007
0.6928 0.8832 0.7556 0.7968 0.8065 0.7338 0.5886 0.8007
SVC_csp_psd_logf CLEAN
Public AUROC: 0.74494
Predicted AUROC: 0.8182
0.6226 0.8966 0.8246 0.7577 0.9060 0.8481 0.5824 0.8182
SVC_dirtycsp_psd_logf DIRTY
Public AUROC: 0.74458
Predicted AUROC: 0.8189
0.6226 0.8966 0.8246 0.7577 0.9060 0.7947 0.7046 0.8189
SVC_raw_pwling4 CLEAN
Public AUROC: 0.65123
Predicted AUROC: 0.6756
0.4787 0.7702 0.6441 0.6748 0.7051 0.7326 0.7392 0.6756
SVC_dirtyraw_pwling4 DIRTY
Public AUROC: 0.64573
Predicted AUROC: 0.6656
0.4787 0.7702 0.6441 0.6748 0.7051 0.3036 0.7128 0.6656
Thought that with feature selection the idea of combining features that perform well on each subject made sense. Tried the settings used by Scott above for the same thing, but added RF feature selection. Training score:
forestselection_bestbysubj 0.630185950413 0.993507726101 0.863694214876 0.763790561504 0.970876033058 0.904247509027 0.63541322314 0.868102807597
Submitted and got 0.76471, no improvement.
Taking features reported to perform well on the subjects we find difficult and adding them to the probablygood features along with the RF feature selection. Training scores were:
bbsubj_pg.json
0.618776859504 0.991768830696 0.885159090909 0.775496269191 0.979101928375 0.906237145208 0.592644628099 0.868513930942
Submitted and score was 0.77768, not an improvement.