Content

MNIST dataset

The whole setup, including the dataset can be downloaded here. The training set has 60000 samples, the test set 10000. Number of features is 784, number of target classes is 10. Internal validation is done with 4-fold cross validation, this means that 4 models are trained in parallel. Prediction type is Retraining. Times are measuared using (1.) machine. For the MNIST data it is better to have an equal normalization over all 784 input dimensions (option enableGlobalMeanStdEstimate=1 in the Master.dsc file). We list here the outcomes of each base learner.

model	notes	training time 60k samples	prediction time 10k samples	cross validation RMSE	cross validation classification error	test RMSE	test classification error
LR - linear regression	15 search epochs, λ=0.0168948	42[s]	0[s]	0.394749	14.78%	0.391005	13.84%
PR - polynomial regression	15 search epochs, polyOrder=2, λ=0.00708574	199[s]	1[s]	0.369988	12.0417%	0.365245	11.08%
GBDT - gradient boosted decision tree	400 epochs, featureSubspaceSize=20, maxTreeLeafes=50, η=0.1, optSplitPoint=no	6874[s]	6[s]	0.202749	3.29833%	0.199084	3.21%
KNN - k-nearest neighbors	15 search epochs, distance=pearson, k=4	890[s]	170[s]	0.126198	2.27%	0.12802	2.31%
NN - neural network	50 epochs, stochastic gradient descent, Net: 784-1500-1200-10, η=0.001, λ=1e-4	81899[s]	6[s]	0.116539	1.425%	0.108591	1.38%
NN - neural network	100 epochs, stochastic gradient descent, Net: 784-2000-1500-10, η=0.001, λ=1e-4	252200[s]	8[s]	0.1118	1.40167%	0.105383	1.37%
KRR - kernel ridge regression	15 search epochs, gauss kernel, sigma=26.9048, λ=6.62789e-08	37228[s] (2.)	1094[s] (2.)	0.146826	1.30667%	0.142177	1.22%