The whole setup, including the dataset can be downloaded here. The training set has 60000 samples, the test set 10000. Number of features is 784, number of target classes is 10. Internal validation is done with 4-fold cross validation, this means that 4 models are trained in parallel. Prediction type is Retraining. Times are measuared using (1.) machine. For the MNIST data it is better to have an equal normalization over all 784 input dimensions (option `enableGlobalMeanStdEstimate=1` in the `Master.dsc` file). We list here the outcomes of each base learner.

model | notes | training time 60k samples | prediction time 10k samples | cross validation RMSE | cross validation classification error | test RMSE | test classification error |
---|---|---|---|---|---|---|---|

LR - linear regression |
15 search epochs, λ=0.0168948 | 42[s] | 0[s] | 0.394749 | 14.78% | 0.391005 | 13.84% |

PR - polynomial regression |
15 search epochs, polyOrder=2, λ=0.00708574 | 199[s] | 1[s] | 0.369988 | 12.0417% | 0.365245 | 11.08% |

GBDT - gradient boosted decision tree |
400 epochs, featureSubspaceSize=20, maxTreeLeafes=50, η=0.1, optSplitPoint=no | 6874[s] | 6[s] | 0.202749 | 3.29833% | 0.199084 | 3.21% |

KNN - k-nearest neighbors |
15 search epochs, distance=pearson, k=4 | 890[s] | 170[s] | 0.126198 | 2.27% | 0.12802 | 2.31% |

NN - neural network |
50 epochs, stochastic gradient descent, Net: 784-1500-1200-10, η=0.001, λ=1e-4 | 81899[s] | 6[s] | 0.116539 | 1.425% | 0.108591 | 1.38% |

NN - neural network |
100 epochs, stochastic gradient descent, Net: 784-2000-1500-10, η=0.001, λ=1e-4 | 252200[s] | 8[s] | 0.1118 | 1.40167% | 0.105383 | 1.37% |

KRR - kernel ridge regression |
15 search epochs, gauss kernel, sigma=26.9048, λ=6.62789e-08 | 37228[s] (2.) | 1094[s] (2.) | 0.146826 | 1.30667% | 0.142177 | 1.22% |