MNIST dataset

The whole setup, including the dataset can be downloaded here. The training set has 60000 samples, the test set 10000. Number of features is 784, number of target classes is 10. Internal validation is done with 4-fold cross validation, this means that 4 models are trained in parallel. Prediction type is Retraining. Times are measuared using (1.) machine. For the MNIST data it is better to have an equal normalization over all 784 input dimensions (option enableGlobalMeanStdEstimate=1 in the Master.dsc file). We list here the outcomes of each base learner.




model notes training time 60k samples prediction time 10k samples cross validation RMSE cross validation classification error test RMSE test classification error
LR - linear regression 15 search epochs, λ=0.0168948 42[s] 0[s] 0.394749 14.78% 0.391005 13.84%
PR - polynomial regression 15 search epochs, polyOrder=2, λ=0.00708574 199[s] 1[s] 0.369988 12.0417% 0.365245 11.08%
GBDT - gradient boosted decision tree 400 epochs, featureSubspaceSize=20, maxTreeLeafes=50, η=0.1, optSplitPoint=no 6874[s] 6[s] 0.202749 3.29833% 0.199084 3.21%
KNN - k-nearest neighbors 15 search epochs, distance=pearson, k=4 890[s] 170[s] 0.126198 2.27% 0.12802 2.31%
NN - neural network 50 epochs, stochastic gradient descent, Net: 784-1500-1200-10, η=0.001, λ=1e-4 81899[s] 6[s] 0.116539 1.425% 0.108591 1.38%
NN - neural network 100 epochs, stochastic gradient descent, Net: 784-2000-1500-10, η=0.001, λ=1e-4 252200[s] 8[s] 0.1118 1.40167% 0.105383 1.37%
KRR - kernel ridge regression 15 search epochs, gauss kernel, sigma=26.9048, λ=6.62789e-08 37228[s] (2.) 1094[s] (2.) 0.146826 1.30667% 0.142177 1.22%