Blending Netflix predictions

This is the corresponding online material according to the KDD 2010 Research Paper [pdf].

The Netflix Prize is a collaborative filtering (CF) problem. Combining different kind of CF algorithm leads to improved accuracy. The combination is a regression task, because each individual algorithm delivers a real-valued prediction. We investigate different blending algorithm with the ELF. Blending is the synonym for combining predictions which leads to improved accuracy in terms of RMSE as error measure. The dataset is generated from the probe set predictions, the Netflix Prize probe set has a size of 1408395 ratings. We split this set randomly into two sets, one is the trainset the other the testset.

Download

The files are in csv format, the values are linewise separated by a comma. The last column is the target value (rating from 1 to 5). The order of features are those from the table below. The number of training samples is 704197, number of test samples is 704198. Train and test set are random subsets from the Netflix Prize probe set. The number of features is 19.
train.csv.bz2(33.2MB) trainUsers.csv.bz2(1.9MB) trainMovies.csv.bz2(1.1MB) trainRatings.csv.bz2(203kB)
test.csv.bz2(33.2MB) testUsers.csv.bz2(1.9MB) testMovies.csv.bz2(1.1MB) testRatings.csv.bz2(203kB)
testQual.csv.bz2(121MB) qualUsers.csv.bz2(7.4MB) qualMovies.csv.bz2(78kB) qualRatings.csv.bz2(800kB). The corresponding predictions for the qualifying set (another test set)
corresponding predictor names (with probe RMSE values)
Indices to the Netflix Prize probe.txt file (start with index=1) probeTrainIndex.csv.bz2, probeTestIndex.csv.bz2
Algorithm templates

This is the set of predictions for the combination. They are selected from a big set of predictions with forward selection. We tried to pick algorithms from different types.

nrnameRMSEdescription
1AFM-10.9362An asymmetric factor model, where the user is expressed via the rated movies. 200 features were used, learn rate η = 1e - 3, regularization λ = 1e - 3, we multiply all η with 0.95 from epoch 30, 120 epochs were trained in total.
2AFM-20.9231 An asymmetric factor model, where the user is expressed via the rated movies. 2000 features were used, learn rate η = 1e - 3, regularization λ = 2e - 3, 23 epochs were trained in total. Based on residuals from KNN-5.
3AFM-30.9340An asymmetric factor model, where the user is expressed via the rated movies. 40 features were used, learn rate η = 1e - 4, regularization λ = 1e - 3, 96 epochs were trained in total.
4AFM-40.9391An asymmetric factor model, where the user is expressed via the rated movies. 900 features were used, learn rate η = 1e - 3, regularization λ = 1e - 2, 43 epochs were trained in total.
5GE-10.9079Global Effects model. Based on residuals from KNN-1, hence this result is the third algorithm in a chain of training on residuals.
6GE-20.9710Global Effects model, trained on raw ratings.
7GE-30.9443Global Effects model, trained on residuals from KNN-4.
8GE-40.9209Global Effects model with time integration, trained on residuals from AFM-2.
9KNN-10.9110A item-item k-nearest neighbors model, similarity measure is the Pearson correlations. We use k = 24 neighbors, this model is based on residuals from AFM-1.
10KNN-20.8904 A item-item k-nearest neighbors model, similarity measure is the set correlation, explained in [42]. We use k = 122 neighbors, this model is based on residuals from a chain of algorithms: RBM-KNN-GE(with time).
11KNN-30.8970A item-item k-nearest neighbors model, similarity measure is the Pearson correlation. We use k = 55 neighbors, this model is based on residuals of a RBM with 150 hidden units.
12KNN-40.9463A item-item k-nearest neighbors model, similarity measure is the Pearson correlation. We use k = 21 neighbors, this model is based on residuals from GE-2.
13RBM-10.9493Restricted Boltzmann Machine with discrete input units. The number of hidden units is 10. Learnrate η = 0.002 and regularization λ = 0.0002.
14RBM-20.9123Restricted Boltzmann Machine with discrete input units. The number of hidden units is 250. Learnrate η = 0.002 and regularization λ = 0.0004.
15SVD-10.9074A rating matrix factorization with 300 features, trained with stochastic gradient descent, 158 epochs, learn rate η = 8e - 4, regularization λ = 0.01, based on item-mean centered data.
16SVD-20.9172A rating matrix factorization with 20 features, trained with stochastic gradient descent, 158 epochs, learn rate η = 0.002, regularization λ = 0.02, based on item-mean centered data.
17SVD-30.9033A rating matrix factorization with 1000 features and adaptive user features (item-item similarity correction). 158 epochs, learn rate η = 0.001, regularization λ = 0.015.
18SVD-40.8871An extended rating matrix factorization with 150 features. Learnrates and regularizations are automatically tuned in order to minimize the RMSE on the probe set, thus they have individual values per trained parameters.
19lg(support)-The number of ratings given by the user, this is known as support. lg() is the natural logarithm.

Blending

A quick howto to start the blending process.

Linear Regression

Linear regression is a linear combination of the predictions. We extend the features with a constant 1 input (addConstantInput=1 in Master.dsc file).

model prediction type notes training time prediction time cross validation RMSE test RMSE
LR - linear regression Retraining, 8-CV 15 search epochs, λ=4.41454e-06 10[s] 0[s] 0.875522 0.875258
LR - linear regression CrossFoldMean, 8-CV 15 search epochs, λ=4.41454e-06 10[s] 0[s] 0.875522 0.875257
LR - linear regression Bagging, size=128 15 search epochs, λ=4.83746e-05 182[s] 4[s] 0.875521 (oob) 0.875258

Polynomial Regression

Polynomial regression is a linear combination of an extended feature space. The extention is generated by a polynomial series. We extend the features with a constant 1 input (addConstantInput=1 in Master.dsc file).

model prediction type notes training time prediction time cross validation RMSE test RMSE
PR - polynomial regression Retraining, 8-CV 15 search epochs, λ=0.000488289, polyOrder=1, crossInteractions=yes 361[s] 3[s] 0.874246 0.873983
PR - polynomial regression CrossFoldMean, 8-CV 15 search epochs, λ=0.000488289, polyOrder=1, crossInteractions=yes 352[s] 21[s] 0.874246 0.873976
PR - polynomial regression Bagging, size=128 15 search epochs, λ=0.000298225, polyOrder=1, crossInteractions=yes 6687[s] 334[s] 0.874241 (oob) 0.87398

Gradient Boosted Decision Tree

GBDT, see here.

model prediction type notes training time prediction time cross validation RMSE test RMSE
GBDT - gradient boosted decision tree Retraining, 8-CV 119 epochs, featureSubspaceSize=3, maxTreeLeafes=100, η=0.1, optSplitPoint=no 364[s] 8[s] 0.875157 0.875043
GBDT - gradient boosted decision tree CrossFoldMean, 8-CV 156 epochs, featureSubspaceSize=3, maxTreeLeafes=100, η=0.1, optSplitPoint=no 469[s] 120[s] 0.875115 0.874231
GBDT - gradient boosted decision tree Bagging, size=128 287 epochs, featureSubspaceSize=3, maxTreeLeafes=100, η=0.1, optSplitPoint=no 8554[s] 7089[s] 0.874215 (oob) 0.873927
GBDT - gradient boosted decision tree Bagging, size=128 254 epochs, featureSubspaceSize=4, maxTreeLeafes=100, η=0.1, optSplitPoint=no 8816[s] 6331[s] 0.874184 (oob) 0.873916
GBDT - gradient boosted decision tree Bagging, size=128 229 epochs, featureSubspaceSize=5, maxTreeLeafes=100, η=0.1, optSplitPoint=no 9081[s] 5814[s] 0.874165 (oob) 0.87389
GBDT - gradient boosted decision tree Bagging, size=128 215 epochs, featureSubspaceSize=6, maxTreeLeafes=100, η=0.1, optSplitPoint=no 9534[s] 5481[s] 0.87415 (oob) 0.873888
GBDT - gradient boosted decision tree Bagging, size=128 210 epochs, featureSubspaceSize=7, maxTreeLeafes=100, η=0.1, optSplitPoint=no 10468[s] 5442[s] 0.874142 (oob) 0.873871
GBDT - gradient boosted decision tree Bagging, size=128 201 epochs, featureSubspaceSize=8, maxTreeLeafes=100, η=0.1, optSplitPoint=no 11029[s] 5221[s] 0.874148 (oob) 0.873869
GBDT - gradient boosted decision tree Bagging, size=128 191 epochs, featureSubspaceSize=9, maxTreeLeafes=100, η=0.1, optSplitPoint=no 11667[s] 5004[s] 0.0.874134 (oob) 0.873865
GBDT - gradient boosted decision tree Bagging, size=128 179 epochs, featureSubspaceSize=10, maxTreeLeafes=100, η=0.1, optSplitPoint=no 11811[s] 4690[s] 0.874128 (oob) 0.873869
GBDT - gradient boosted decision tree Bagging, size=128 182 epochs, featureSubspaceSize=11, maxTreeLeafes=100, η=0.1, optSplitPoint=no 13095[s] 4820[s] 0.874128 (oob) 0.873864
GBDT - gradient boosted decision tree Bagging, size=128 155 epochs, featureSubspaceSize=20, maxTreeLeafes=100, η=0.1, optSplitPoint=no 18138[s] 4255[s] 0.874114 (oob) 0.873859
GBDT - gradient boosted decision tree Bagging, size=128 99 epochs, featureSubspaceSize=20, maxTreeLeafes=200, η=0.1, optSplitPoint=no 12400[s] 3142[s] 0.874153 (oob) 0.873899
GBDT - gradient boosted decision tree Bagging, size=128 255 epochs, featureSubspaceSize=20, maxTreeLeafes=50, η=0.1, optSplitPoint=no 26235[s] 5075[s] 0.874103 (oob) 0.873842
GBDT - gradient boosted decision tree Bagging, size=128 393 epochs, featureSubspaceSize=20, maxTreeLeafes=30, η=0.1, optSplitPoint=no 35390[s] 5928[s] 0.874112 (oob) 0.873846
GBDT - gradient boosted decision tree Bagging, size=32 41 epochs, featureSubspaceSize=20, maxTreeLeafes=500, η=0.1, optSplitPoint=yes 8944[s] 409[s] 0.874924 (oob) 0.8746
GBDT - gradient boosted decision tree Bagging, size=32 40 epochs, featureSubspaceSize=10, maxTreeLeafes=500, η=0.1, optSplitPoint=yes 4713[s] 406[s] 0.874819 (oob) 0.874522
GBDT - gradient boosted decision tree Bagging, size=32 41 epochs, featureSubspaceSize=8, maxTreeLeafes=500, η=0.1, optSplitPoint=yes 4014[s] 425[s] 0.874841 (oob) 0.874504
GBDT - gradient boosted decision tree Bagging, size=32 42 epochs, featureSubspaceSize=7, maxTreeLeafes=500, η=0.1, optSplitPoint=yes 3711[s] 442[s] 0.874767 (oob) 0.874481
GBDT - gradient boosted decision tree Bagging, size=32 41 epochs, featureSubspaceSize=6, maxTreeLeafes=500, η=0.1, optSplitPoint=yes 3028[s] 420[s] 0.874814 (oob) 0.874429
GBDT - gradient boosted decision tree Bagging, size=32 43 epochs, featureSubspaceSize=5, maxTreeLeafes=500, η=0.1, optSplitPoint=yes 2806[s] 454[s] 0.874736 (oob) 0.874432
GBDT - gradient boosted decision tree Bagging, size=32 44 epochs, featureSubspaceSize=4, maxTreeLeafes=500, η=0.1, optSplitPoint=yes 2432[s] 475[s] 0.874775 (oob) 0.874405
GBDT - gradient boosted decision tree Bagging, size=32 45 epochs, featureSubspaceSize=3, maxTreeLeafes=500, η=0.1, optSplitPoint=yes 2010[s] 481[s] 0.874774 (oob) 0.874381
GBDT - gradient boosted decision tree Bagging, size=32 48 epochs, featureSubspaceSize=2, maxTreeLeafes=500, η=0.1, optSplitPoint=yes 1525[s] 525[s] 0.874784 (oob) 0.874377
GBDT - gradient boosted decision tree Bagging, size=32 64 epochs, featureSubspaceSize=1, maxTreeLeafes=500, η=0.1, optSplitPoint=yes 1741[s] 874[s] 0.874838 (oob) 0.874427
GBDT - gradient boosted decision tree Bagging, size=32 58 epochs, featureSubspaceSize=2, maxTreeLeafes=300, η=0.1, optSplitPoint=yes 2249[s] 612[s] 0.874783 (oob) 0.87437
GBDT - gradient boosted decision tree Bagging, size=32 63 epochs, featureSubspaceSize=2, maxTreeLeafes=200, η=0.1, optSplitPoint=yes 2620[s] 611[s] 0.874767 (oob) 0.874399
GBDT - gradient boosted decision tree Bagging, size=32 94 epochs, featureSubspaceSize=2, maxTreeLeafes=100, η=0.1, optSplitPoint=yes 5009[s] 979[s] 0.874934 (oob) 0.874546
GBDT - gradient boosted decision tree Bagging, size=32 118 epochs, featureSubspaceSize=2, maxTreeLeafes=300, η=0.05, optSplitPoint=yes 4362[s] 1361[s] 0.87467 (oob) 0.874352
GBDT - gradient boosted decision tree Bagging, size=32 196 epochs, featureSubspaceSize=2, maxTreeLeafes=300, η=0.03, optSplitPoint=yes 6978[s] 2206[s] 0.874624 (oob) 0.87433
GBDT - gradient boosted decision tree Bagging, size=32 305 epochs, featureSubspaceSize=2, maxTreeLeafes=300, η=0.02, optSplitPoint=yes 11788[s] 3540[s] 0.874593 (oob) 0.874309
GBDT - gradient boosted decision tree Bagging, size=32 605 epochs, featureSubspaceSize=2, maxTreeLeafes=300, η=0.01, optSplitPoint=yes 22811[s] 6849[s] 0.8746 (oob) 0.874322
GBDT - gradient boosted decision tree Bagging, size=32 1171 epochs, featureSubspaceSize=2, maxTreeLeafes=300, η=0.005, optSplitPoint=yes 40364[s] 12545[s] 0.874597 (oob) 0.874326
GBDT - gradient boosted decision tree Bagging, size=16 274 epochs, featureSubspaceSize=2, maxTreeLeafes=300, η=0.02, optSplitPoint=yes 3951[s] 1400[s] 0.874936 (oob) 0.874381
GBDT - gradient boosted decision tree Bagging, size=64 315 epochs, featureSubspaceSize=2, maxTreeLeafes=300, η=0.02, optSplitPoint=yes 25219[s] 7350[s] 0.874554 (oob) 0.874293
GBDT - gradient boosted decision tree Bagging, size=128 319 epochs, featureSubspaceSize=2, maxTreeLeafes=300, η=0.02, optSplitPoint=yes 52805[s] 15070[s] 0.874517 (oob) 0.874288

Neural Networks

Neural networks are non-linear function approximators, see here.

model prediction type notes training time prediction time cross validation RMSE test RMSE
NN - neural network Retraining, 8-CV 1005 epochs, Net: 19-30-1, η=5e-4, η-=5e-7, λ=0 5413[s] 1[s] 0.873633 0.873365 (qual 0.866345 3[s])
NN - neural network CrossFoldMean, 8-CV 1005 epochs, Net: 19-30-1, η=5e-4, η-=5e-7, λ=0 3163[s] 5[s] 0.873633 0.873316 (qual 0.866316 18[s])
NN - neural network CrossFoldMean, 8-CV 996 epochs, Net: 19-50-1, η=5e-4, η-=5e-7, λ=0 15162(core2)[s] 11(core2)[s] 0.873617 0.87325
NN - neural network CrossFoldMean, 8-CV 996 epochs, Net: 19-100-1, η=5e-4, η-=5e-7, λ=0 26129(core2)[s] 16(core2)[s] 0.873523 0.873255
NN - neural network CrossFoldMean, 8-CV 996 epochs, Net: 19-200-1, η=5e-4, η-=5e-7, λ=0 51858(core2)[s] 32(core2)[s] 0.873602 0.87326
NN - neural network Bagging, size=32 1009 epochs, Net: 19-30-1, η=5e-4, η-=5e-7, λ=0 15568[s] 18[s] 0.87347 0.873191 (qual 0.866215 74[s])
NN - neural network Bagging, size=128 1010 epochs, Net: 19-30-1, η=5e-4, η-=5e-7, λ=0 62419[s] 73[s] 0.873436 0.873185
NN - neural network Bagging, size=128 980 epochs, Net: 19-70-1, η=5e-4, η-=5e-7, λ=0 121049[s] 126[s] 0.87342 0.873163 (qual 0.866169 524[s])
NN - neural network Bagging, size=128 793 epochs, Net: 19-150-1, η=5e-4, η-=5e-7, λ=0 236993[s] 245[s] 0.873473 0.873169 (qual 0.866204 981[s])
NN - neural network Bagging, size=128 961 epochs, Net: 19-30-50-1, η=5e-4, η-=5e-7, λ=0 164416[s] 155[s] 0.873487 0.873174 (qual 0.866261 622[s])
NN - neural network Bagging, size=128 908 epochs, Net: 19-50-30-1, η=5e-4, η-=5e-7, λ=0 175132[s] 167[s] 0.873455 0.87318 (qual 0.866253 703[s])
NN - neural network Bagging, size=128 771 epochs, Net: 19-70-50-1, η=5e-4, η-=5e-7, λ=0 292926[s] 240[s] 0.873474 0.873217 (qual 0.866236 984[s])

Combination of NN+PR+GBDT

We use the out of bag estimate as error measure for the ensemble accuracy. Here are the dsc templates.

order model prediction type notes training time prediction time cross validation RMSE blend RMSE test RMSE
1 NN - neural network Bagging, size=128 870 epochs, Net: 19-100-1, η=0.0005, η-=5e-7, λ=0, scale=3.6, offset=3.0 159973[s] [s] 0.87345 (oob) 0.873445
2 GBDT - gradient boosted decision tree Bagging, size=128 226 epochs, featureSubspaceSize=20, maxTreeLeafes=50, η=0.1, optSplitPoint=no 23834[s] [s] 0.874111 (oob) 0.873387
3 GBDT - gradient boosted decision tree Bagging, size=128 267 epochs, featureSubspaceSize=2, maxTreeLeafes=300, η=0.02, optSplitPoint=yes 29168[s] [s] 0.874603 (oob) 0.873384
4 PR - polynomial regression Bagging, size=128 15 search epochs, λ=2.4e-06, polyOrder=1, crossInteractions=yes 6674[s] [s] 0.874358 (oob) 0.87336
5 PR - polynomial regression Bagging, size=128 15 search epochs, λ=0.054, polyOrder=3, crossInteractions=no 1076[s] [s] 0.895951 (oob) 0.873351
6 NN - neural network Bagging, size=128 998 epochs, Net: 19-100-1, η=0.0005, η-=5e-7, λ=0, scale=2.0, offset=3.0 169560[s] [s] 0.87345 (oob) 0.873296
7 NN - neural network Bagging, size=128 952 epochs, Net: 19-50-30-1, η=0.0005, η-=5e-7, λ=0, scale=2.0, offset=3.0 179280[s] 16200[s] 0.873449 (oob) 0.873227 blend: 0.87297