This is the corresponding online material according to the KDD 2010 Research Paper [pdf].
The Netflix Prize is a collaborative filtering (CF) problem. Combining different kind of CF algorithm leads to improved accuracy. The combination is a regression task, because each individual algorithm delivers a real-valued prediction. We investigate different blending algorithm with the ELF. Blending is the synonym for combining predictions which leads to improved accuracy in terms of RMSE as error measure. The dataset is generated from the probe set predictions, the Netflix Prize probe set has a size of 1408395 ratings. We split this set randomly into two sets, one is the trainset the other the testset.
The files are in csv format, the values are linewise separated by a comma. The last column is the target value (rating from 1 to 5). The order of features are those from the table below. The number of training samples is 704197, number of test samples is 704198. Train and test set are random subsets from the Netflix Prize probe set. The number of features is 19.
train.csv.bz2(33.2MB) trainUsers.csv.bz2(1.9MB) trainMovies.csv.bz2(1.1MB) trainRatings.csv.bz2(203kB)
test.csv.bz2(33.2MB) testUsers.csv.bz2(1.9MB) testMovies.csv.bz2(1.1MB) testRatings.csv.bz2(203kB)
testQual.csv.bz2(121MB) qualUsers.csv.bz2(7.4MB) qualMovies.csv.bz2(78kB) qualRatings.csv.bz2(800kB). The corresponding predictions for the qualifying set (another test set)
corresponding predictor names (with probe RMSE values)
Indices to the Netflix Prize probe.txt file (start with index=1) probeTrainIndex.csv.bz2, probeTestIndex.csv.bz2
Algorithm templates
This is the set of predictions for the combination. They are selected from a big set of predictions with forward selection. We tried to pick algorithms from different types.
nr | name | RMSE | description |
---|---|---|---|
1 | AFM-1 | 0.9362 | An asymmetric factor model, where the user is expressed via the rated movies. 200 features were used, learn rate η = 1e - 3, regularization λ = 1e - 3, we multiply all η with 0.95 from epoch 30, 120 epochs were trained in total. |
2 | AFM-2 | 0.9231 | An asymmetric factor model, where the user is expressed via the rated movies. 2000 features were used, learn rate η = 1e - 3, regularization λ = 2e - 3, 23 epochs were trained in total. Based on residuals from KNN-5. |
3 | AFM-3 | 0.9340 | An asymmetric factor model, where the user is expressed via the rated movies. 40 features were used, learn rate η = 1e - 4, regularization λ = 1e - 3, 96 epochs were trained in total. |
4 | AFM-4 | 0.9391 | An asymmetric factor model, where the user is expressed via the rated movies. 900 features were used, learn rate η = 1e - 3, regularization λ = 1e - 2, 43 epochs were trained in total. |
5 | GE-1 | 0.9079 | Global Effects model. Based on residuals from KNN-1, hence this result is the third algorithm in a chain of training on residuals. |
6 | GE-2 | 0.9710 | Global Effects model, trained on raw ratings. |
7 | GE-3 | 0.9443 | Global Effects model, trained on residuals from KNN-4. |
8 | GE-4 | 0.9209 | Global Effects model with time integration, trained on residuals from AFM-2. |
9 | KNN-1 | 0.9110 | A item-item k-nearest neighbors model, similarity measure is the Pearson correlations. We use k = 24 neighbors, this model is based on residuals from AFM-1. |
10 | KNN-2 | 0.8904 | A item-item k-nearest neighbors model, similarity measure is the set correlation, explained in [42]. We use k = 122 neighbors, this model is based on residuals from a chain of algorithms: RBM-KNN-GE(with time). |
11 | KNN-3 | 0.8970 | A item-item k-nearest neighbors model, similarity measure is the Pearson correlation. We use k = 55 neighbors, this model is based on residuals of a RBM with 150 hidden units. |
12 | KNN-4 | 0.9463 | A item-item k-nearest neighbors model, similarity measure is the Pearson correlation. We use k = 21 neighbors, this model is based on residuals from GE-2. |
13 | RBM-1 | 0.9493 | Restricted Boltzmann Machine with discrete input units. The number of hidden units is 10. Learnrate η = 0.002 and regularization λ = 0.0002. |
14 | RBM-2 | 0.9123 | Restricted Boltzmann Machine with discrete input units. The number of hidden units is 250. Learnrate η = 0.002 and regularization λ = 0.0004. |
15 | SVD-1 | 0.9074 | A rating matrix factorization with 300 features, trained with stochastic gradient descent, 158 epochs, learn rate η = 8e - 4, regularization λ = 0.01, based on item-mean centered data. |
16 | SVD-2 | 0.9172 | A rating matrix factorization with 20 features, trained with stochastic gradient descent, 158 epochs, learn rate η = 0.002, regularization λ = 0.02, based on item-mean centered data. |
17 | SVD-3 | 0.9033 | A rating matrix factorization with 1000 features and adaptive user features (item-item similarity correction). 158 epochs, learn rate η = 0.001, regularization λ = 0.015. |
18 | SVD-4 | 0.8871 | An extended rating matrix factorization with 150 features. Learnrates and regularizations are automatically tuned in order to minimize the RMSE on the probe set, thus they have individual values per trained parameters. |
19 | lg(support) | - | The number of ratings given by the user, this is known as support. lg() is the natural logarithm. |
Linear regression is a linear combination of the predictions. We extend the features with a constant 1 input (addConstantInput=1 in Master.dsc file).
model | prediction type | notes | training time | prediction time | cross validation RMSE | test RMSE |
---|---|---|---|---|---|---|
LR - linear regression | Retraining, 8-CV | 15 search epochs, λ=4.41454e-06 | 10[s] | 0[s] | 0.875522 | 0.875258 |
LR - linear regression | CrossFoldMean, 8-CV | 15 search epochs, λ=4.41454e-06 | 10[s] | 0[s] | 0.875522 | 0.875257 |
LR - linear regression | Bagging, size=128 | 15 search epochs, λ=4.83746e-05 | 182[s] | 4[s] | 0.875521 (oob) | 0.875258 |
Polynomial regression is a linear combination of an extended feature space. The extention is generated by a polynomial series. We extend the features with a constant 1 input (addConstantInput=1 in Master.dsc file).
model | prediction type | notes | training time | prediction time | cross validation RMSE | test RMSE |
---|---|---|---|---|---|---|
PR - polynomial regression | Retraining, 8-CV | 15 search epochs, λ=0.000488289, polyOrder=1, crossInteractions=yes | 361[s] | 3[s] | 0.874246 | 0.873983 |
PR - polynomial regression | CrossFoldMean, 8-CV | 15 search epochs, λ=0.000488289, polyOrder=1, crossInteractions=yes | 352[s] | 21[s] | 0.874246 | 0.873976 |
PR - polynomial regression | Bagging, size=128 | 15 search epochs, λ=0.000298225, polyOrder=1, crossInteractions=yes | 6687[s] | 334[s] | 0.874241 (oob) | 0.87398 |
GBDT, see here.
model | prediction type | notes | training time | prediction time | cross validation RMSE | test RMSE |
---|---|---|---|---|---|---|
GBDT - gradient boosted decision tree | Retraining, 8-CV | 119 epochs, featureSubspaceSize=3, maxTreeLeafes=100, η=0.1, optSplitPoint=no | 364[s] | 8[s] | 0.875157 | 0.875043 |
GBDT - gradient boosted decision tree | CrossFoldMean, 8-CV | 156 epochs, featureSubspaceSize=3, maxTreeLeafes=100, η=0.1, optSplitPoint=no | 469[s] | 120[s] | 0.875115 | 0.874231 |
GBDT - gradient boosted decision tree | Bagging, size=128 | 287 epochs, featureSubspaceSize=3, maxTreeLeafes=100, η=0.1, optSplitPoint=no | 8554[s] | 7089[s] | 0.874215 (oob) | 0.873927 |
GBDT - gradient boosted decision tree | Bagging, size=128 | 254 epochs, featureSubspaceSize=4, maxTreeLeafes=100, η=0.1, optSplitPoint=no | 8816[s] | 6331[s] | 0.874184 (oob) | 0.873916 |
GBDT - gradient boosted decision tree | Bagging, size=128 | 229 epochs, featureSubspaceSize=5, maxTreeLeafes=100, η=0.1, optSplitPoint=no | 9081[s] | 5814[s] | 0.874165 (oob) | 0.87389 |
GBDT - gradient boosted decision tree | Bagging, size=128 | 215 epochs, featureSubspaceSize=6, maxTreeLeafes=100, η=0.1, optSplitPoint=no | 9534[s] | 5481[s] | 0.87415 (oob) | 0.873888 |
GBDT - gradient boosted decision tree | Bagging, size=128 | 210 epochs, featureSubspaceSize=7, maxTreeLeafes=100, η=0.1, optSplitPoint=no | 10468[s] | 5442[s] | 0.874142 (oob) | 0.873871 |
GBDT - gradient boosted decision tree | Bagging, size=128 | 201 epochs, featureSubspaceSize=8, maxTreeLeafes=100, η=0.1, optSplitPoint=no | 11029[s] | 5221[s] | 0.874148 (oob) | 0.873869 |
GBDT - gradient boosted decision tree | Bagging, size=128 | 191 epochs, featureSubspaceSize=9, maxTreeLeafes=100, η=0.1, optSplitPoint=no | 11667[s] | 5004[s] | 0.0.874134 (oob) | 0.873865 |
GBDT - gradient boosted decision tree | Bagging, size=128 | 179 epochs, featureSubspaceSize=10, maxTreeLeafes=100, η=0.1, optSplitPoint=no | 11811[s] | 4690[s] | 0.874128 (oob) | 0.873869 |
GBDT - gradient boosted decision tree | Bagging, size=128 | 182 epochs, featureSubspaceSize=11, maxTreeLeafes=100, η=0.1, optSplitPoint=no | 13095[s] | 4820[s] | 0.874128 (oob) | 0.873864 |
GBDT - gradient boosted decision tree | Bagging, size=128 | 155 epochs, featureSubspaceSize=20, maxTreeLeafes=100, η=0.1, optSplitPoint=no | 18138[s] | 4255[s] | 0.874114 (oob) | 0.873859 |
GBDT - gradient boosted decision tree | Bagging, size=128 | 99 epochs, featureSubspaceSize=20, maxTreeLeafes=200, η=0.1, optSplitPoint=no | 12400[s] | 3142[s] | 0.874153 (oob) | 0.873899 |
GBDT - gradient boosted decision tree | Bagging, size=128 | 255 epochs, featureSubspaceSize=20, maxTreeLeafes=50, η=0.1, optSplitPoint=no | 26235[s] | 5075[s] | 0.874103 (oob) | 0.873842 |
GBDT - gradient boosted decision tree | Bagging, size=128 | 393 epochs, featureSubspaceSize=20, maxTreeLeafes=30, η=0.1, optSplitPoint=no | 35390[s] | 5928[s] | 0.874112 (oob) | 0.873846 |
GBDT - gradient boosted decision tree | Bagging, size=32 | 41 epochs, featureSubspaceSize=20, maxTreeLeafes=500, η=0.1, optSplitPoint=yes | 8944[s] | 409[s] | 0.874924 (oob) | 0.8746 |
GBDT - gradient boosted decision tree | Bagging, size=32 | 40 epochs, featureSubspaceSize=10, maxTreeLeafes=500, η=0.1, optSplitPoint=yes | 4713[s] | 406[s] | 0.874819 (oob) | 0.874522 |
GBDT - gradient boosted decision tree | Bagging, size=32 | 41 epochs, featureSubspaceSize=8, maxTreeLeafes=500, η=0.1, optSplitPoint=yes | 4014[s] | 425[s] | 0.874841 (oob) | 0.874504 |
GBDT - gradient boosted decision tree | Bagging, size=32 | 42 epochs, featureSubspaceSize=7, maxTreeLeafes=500, η=0.1, optSplitPoint=yes | 3711[s] | 442[s] | 0.874767 (oob) | 0.874481 |
GBDT - gradient boosted decision tree | Bagging, size=32 | 41 epochs, featureSubspaceSize=6, maxTreeLeafes=500, η=0.1, optSplitPoint=yes | 3028[s] | 420[s] | 0.874814 (oob) | 0.874429 |
GBDT - gradient boosted decision tree | Bagging, size=32 | 43 epochs, featureSubspaceSize=5, maxTreeLeafes=500, η=0.1, optSplitPoint=yes | 2806[s] | 454[s] | 0.874736 (oob) | 0.874432 |
GBDT - gradient boosted decision tree | Bagging, size=32 | 44 epochs, featureSubspaceSize=4, maxTreeLeafes=500, η=0.1, optSplitPoint=yes | 2432[s] | 475[s] | 0.874775 (oob) | 0.874405 |
GBDT - gradient boosted decision tree | Bagging, size=32 | 45 epochs, featureSubspaceSize=3, maxTreeLeafes=500, η=0.1, optSplitPoint=yes | 2010[s] | 481[s] | 0.874774 (oob) | 0.874381 |
GBDT - gradient boosted decision tree | Bagging, size=32 | 48 epochs, featureSubspaceSize=2, maxTreeLeafes=500, η=0.1, optSplitPoint=yes | 1525[s] | 525[s] | 0.874784 (oob) | 0.874377 |
GBDT - gradient boosted decision tree | Bagging, size=32 | 64 epochs, featureSubspaceSize=1, maxTreeLeafes=500, η=0.1, optSplitPoint=yes | 1741[s] | 874[s] | 0.874838 (oob) | 0.874427 |
GBDT - gradient boosted decision tree | Bagging, size=32 | 58 epochs, featureSubspaceSize=2, maxTreeLeafes=300, η=0.1, optSplitPoint=yes | 2249[s] | 612[s] | 0.874783 (oob) | 0.87437 |
GBDT - gradient boosted decision tree | Bagging, size=32 | 63 epochs, featureSubspaceSize=2, maxTreeLeafes=200, η=0.1, optSplitPoint=yes | 2620[s] | 611[s] | 0.874767 (oob) | 0.874399 |
GBDT - gradient boosted decision tree | Bagging, size=32 | 94 epochs, featureSubspaceSize=2, maxTreeLeafes=100, η=0.1, optSplitPoint=yes | 5009[s] | 979[s] | 0.874934 (oob) | 0.874546 |
GBDT - gradient boosted decision tree | Bagging, size=32 | 118 epochs, featureSubspaceSize=2, maxTreeLeafes=300, η=0.05, optSplitPoint=yes | 4362[s] | 1361[s] | 0.87467 (oob) | 0.874352 |
GBDT - gradient boosted decision tree | Bagging, size=32 | 196 epochs, featureSubspaceSize=2, maxTreeLeafes=300, η=0.03, optSplitPoint=yes | 6978[s] | 2206[s] | 0.874624 (oob) | 0.87433 |
GBDT - gradient boosted decision tree | Bagging, size=32 | 305 epochs, featureSubspaceSize=2, maxTreeLeafes=300, η=0.02, optSplitPoint=yes | 11788[s] | 3540[s] | 0.874593 (oob) | 0.874309 |
GBDT - gradient boosted decision tree | Bagging, size=32 | 605 epochs, featureSubspaceSize=2, maxTreeLeafes=300, η=0.01, optSplitPoint=yes | 22811[s] | 6849[s] | 0.8746 (oob) | 0.874322 |
GBDT - gradient boosted decision tree | Bagging, size=32 | 1171 epochs, featureSubspaceSize=2, maxTreeLeafes=300, η=0.005, optSplitPoint=yes | 40364[s] | 12545[s] | 0.874597 (oob) | 0.874326 |
GBDT - gradient boosted decision tree | Bagging, size=16 | 274 epochs, featureSubspaceSize=2, maxTreeLeafes=300, η=0.02, optSplitPoint=yes | 3951[s] | 1400[s] | 0.874936 (oob) | 0.874381 |
GBDT - gradient boosted decision tree | Bagging, size=64 | 315 epochs, featureSubspaceSize=2, maxTreeLeafes=300, η=0.02, optSplitPoint=yes | 25219[s] | 7350[s] | 0.874554 (oob) | 0.874293 |
GBDT - gradient boosted decision tree | Bagging, size=128 | 319 epochs, featureSubspaceSize=2, maxTreeLeafes=300, η=0.02, optSplitPoint=yes | 52805[s] | 15070[s] | 0.874517 (oob) | 0.874288 |
Neural networks are non-linear function approximators, see here.
model | prediction type | notes | training time | prediction time | cross validation RMSE | test RMSE |
---|---|---|---|---|---|---|
NN - neural network | Retraining, 8-CV | 1005 epochs, Net: 19-30-1, η=5e-4, η-=5e-7, λ=0 | 5413[s] | 1[s] | 0.873633 | 0.873365 (qual 0.866345 3[s]) |
NN - neural network | CrossFoldMean, 8-CV | 1005 epochs, Net: 19-30-1, η=5e-4, η-=5e-7, λ=0 | 3163[s] | 5[s] | 0.873633 | 0.873316 (qual 0.866316 18[s]) |
NN - neural network | CrossFoldMean, 8-CV | 996 epochs, Net: 19-50-1, η=5e-4, η-=5e-7, λ=0 | 15162(core2)[s] | 11(core2)[s] | 0.873617 | 0.87325 |
NN - neural network | CrossFoldMean, 8-CV | 996 epochs, Net: 19-100-1, η=5e-4, η-=5e-7, λ=0 | 26129(core2)[s] | 16(core2)[s] | 0.873523 | 0.873255 |
NN - neural network | CrossFoldMean, 8-CV | 996 epochs, Net: 19-200-1, η=5e-4, η-=5e-7, λ=0 | 51858(core2)[s] | 32(core2)[s] | 0.873602 | 0.87326 |
NN - neural network | Bagging, size=32 | 1009 epochs, Net: 19-30-1, η=5e-4, η-=5e-7, λ=0 | 15568[s] | 18[s] | 0.87347 | 0.873191 (qual 0.866215 74[s]) |
NN - neural network | Bagging, size=128 | 1010 epochs, Net: 19-30-1, η=5e-4, η-=5e-7, λ=0 | 62419[s] | 73[s] | 0.873436 | 0.873185 |
NN - neural network | Bagging, size=128 | 980 epochs, Net: 19-70-1, η=5e-4, η-=5e-7, λ=0 | 121049[s] | 126[s] | 0.87342 | 0.873163 (qual 0.866169 524[s]) |
NN - neural network | Bagging, size=128 | 793 epochs, Net: 19-150-1, η=5e-4, η-=5e-7, λ=0 | 236993[s] | 245[s] | 0.873473 | 0.873169 (qual 0.866204 981[s]) |
NN - neural network | Bagging, size=128 | 961 epochs, Net: 19-30-50-1, η=5e-4, η-=5e-7, λ=0 | 164416[s] | 155[s] | 0.873487 | 0.873174 (qual 0.866261 622[s]) |
NN - neural network | Bagging, size=128 | 908 epochs, Net: 19-50-30-1, η=5e-4, η-=5e-7, λ=0 | 175132[s] | 167[s] | 0.873455 | 0.87318 (qual 0.866253 703[s]) |
NN - neural network | Bagging, size=128 | 771 epochs, Net: 19-70-50-1, η=5e-4, η-=5e-7, λ=0 | 292926[s] | 240[s] | 0.873474 | 0.873217 (qual 0.866236 984[s]) |
We use the out of bag estimate as error measure for the ensemble accuracy. Here are the dsc templates.
order | model | prediction type | notes | training time | prediction time | cross validation RMSE | blend RMSE | test RMSE |
---|---|---|---|---|---|---|---|---|
1 | NN - neural network | Bagging, size=128 | 870 epochs, Net: 19-100-1, η=0.0005, η-=5e-7, λ=0, scale=3.6, offset=3.0 | 159973[s] | [s] | 0.87345 (oob) | 0.873445 | |
2 | GBDT - gradient boosted decision tree | Bagging, size=128 | 226 epochs, featureSubspaceSize=20, maxTreeLeafes=50, η=0.1, optSplitPoint=no | 23834[s] | [s] | 0.874111 (oob) | 0.873387 | |
3 | GBDT - gradient boosted decision tree | Bagging, size=128 | 267 epochs, featureSubspaceSize=2, maxTreeLeafes=300, η=0.02, optSplitPoint=yes | 29168[s] | [s] | 0.874603 (oob) | 0.873384 | |
4 | PR - polynomial regression | Bagging, size=128 | 15 search epochs, λ=2.4e-06, polyOrder=1, crossInteractions=yes | 6674[s] | [s] | 0.874358 (oob) | 0.87336 | |
5 | PR - polynomial regression | Bagging, size=128 | 15 search epochs, λ=0.054, polyOrder=3, crossInteractions=no | 1076[s] | [s] | 0.895951 (oob) | 0.873351 | |
6 | NN - neural network | Bagging, size=128 | 998 epochs, Net: 19-100-1, η=0.0005, η-=5e-7, λ=0, scale=2.0, offset=3.0 | 169560[s] | [s] | 0.87345 (oob) | 0.873296 | |
7 | NN - neural network | Bagging, size=128 | 952 epochs, Net: 19-50-30-1, η=0.0005, η-=5e-7, λ=0, scale=2.0, offset=3.0 | 179280[s] | 16200[s] | 0.873449 (oob) | 0.873227 | blend: 0.87297 |