The preconfigured files including the dataset can be downloaded here. Extract the dataset to the ./DataFiles directory. Start the training with $ ./ELF PAKDDCup2010 t and predict the test set with $ ./ELF PAKDDCup2010 p . Filename of the training and the testset are configured under ./DataFiles/settings.txt.
A short technical report can be found here.
The solution described in the report can be found on place 7. See: PAKDDCup2010-Results
| model | notes | training time 60k samples | prediction time 20k samples | cross validation AUC | cross validation RMSE | cross validation classification error | leaderboard AUC |
|---|---|---|---|---|---|---|---|
| LR - linear regression | Retraining, ProbablisticNormalization=yes, λ=0.00358632 | 4200[s] | 1[s] | 0.651516 | 0.854269 | 26.044% | 0.6250 |
| NN - neural network | Retraining, ProbablisticNormalization=no, 143 epochs, stochastic gradient descent, Net: 10n, η=3e-5, λ=8e-2 | 3900[s] | 2[s] | 0.650801 | 0.854371 | 26.088% | 0.6267 |
| KRR - kernel ridge regression | Retraining(12CV), ProbablisticNormalization=yes, gauss kernel, sigma=11.6269, λ=2.95755e-05 | 7255[s] | 5968[s] | 0.659685 | 0.851328 | 25.962% | 0.6236 |
| KRR+GBDT - linear ensemble | Retraining(12CV), ProbablisticNormalization=yes, KRR: gauss kernel, sigma=10.1, λ=4.4e-05, GBDT: 1500epochs, subspaceSize=100, maxLeafs=100, learnrate=0.005, optSplit=yes, calcGlobalMean=yes | 268354[s] | 6446[s] | 0.660936 | 0.850941 | 25.95% | 0.6249 |