Content

Blending Netflix predictions

This is the corresponding online material according to the KDD 2010 Research Paper [pdf].

The Netflix Prize is a collaborative filtering (CF) problem. Combining different kind of CF algorithm leads to improved accuracy. The combination is a regression task, because each individual algorithm delivers a real-valued prediction. We investigate different blending algorithm with the ELF. Blending is the synonym for combining predictions which leads to improved accuracy in terms of RMSE as error measure. The dataset is generated from the probe set predictions, the Netflix Prize probe set has a size of 1408395 ratings. We split this set randomly into two sets, one is the trainset the other the testset.

Download

The files are in csv format, the values are linewise separated by a comma. The last column is the target value (rating from 1 to 5). The order of features are those from the table below. The number of training samples is 704197, number of test samples is 704198. Train and test set are random subsets from the Netflix Prize probe set. The number of features is 19.
train.csv.bz2(33.2MB) trainUsers.csv.bz2(1.9MB) trainMovies.csv.bz2(1.1MB) trainRatings.csv.bz2(203kB)
test.csv.bz2(33.2MB) testUsers.csv.bz2(1.9MB) testMovies.csv.bz2(1.1MB) testRatings.csv.bz2(203kB)
testQual.csv.bz2(121MB) qualUsers.csv.bz2(7.4MB) qualMovies.csv.bz2(78kB) qualRatings.csv.bz2(800kB). The corresponding predictions for the qualifying set (another test set)
corresponding predictor names (with probe RMSE values)
Indices to the Netflix Prize probe.txt file (start with index=1) probeTrainIndex.csv.bz2, probeTestIndex.csv.bz2
Algorithm templates

This is the set of predictions for the combination. They are selected from a big set of predictions with forward selection. We tried to pick algorithms from different types.

nr	name	RMSE	description
1	AFM-1	0.9362	An asymmetric factor model, where the user is expressed via the rated movies. 200 features were used, learn rate η = 1e - 3, regularization λ = 1e - 3, we multiply all η with 0.95 from epoch 30, 120 epochs were trained in total.
2	AFM-2	0.9231	An asymmetric factor model, where the user is expressed via the rated movies. 2000 features were used, learn rate η = 1e - 3, regularization λ = 2e - 3, 23 epochs were trained in total. Based on residuals from KNN-5.
3	AFM-3	0.9340	An asymmetric factor model, where the user is expressed via the rated movies. 40 features were used, learn rate η = 1e - 4, regularization λ = 1e - 3, 96 epochs were trained in total.
4	AFM-4	0.9391	An asymmetric factor model, where the user is expressed via the rated movies. 900 features were used, learn rate η = 1e - 3, regularization λ = 1e - 2, 43 epochs were trained in total.
5	GE-1	0.9079	Global Effects model. Based on residuals from KNN-1, hence this result is the third algorithm in a chain of training on residuals.
6	GE-2	0.9710	Global Effects model, trained on raw ratings.
7	GE-3	0.9443	Global Effects model, trained on residuals from KNN-4.
8	GE-4	0.9209	Global Effects model with time integration, trained on residuals from AFM-2.
9	KNN-1	0.9110	A item-item k-nearest neighbors model, similarity measure is the Pearson correlations. We use k = 24 neighbors, this model is based on residuals from AFM-1.
10	KNN-2	0.8904	A item-item k-nearest neighbors model, similarity measure is the set correlation, explained in [42]. We use k = 122 neighbors, this model is based on residuals from a chain of algorithms: RBM-KNN-GE(with time).
11	KNN-3	0.8970	A item-item k-nearest neighbors model, similarity measure is the Pearson correlation. We use k = 55 neighbors, this model is based on residuals of a RBM with 150 hidden units.
12	KNN-4	0.9463	A item-item k-nearest neighbors model, similarity measure is the Pearson correlation. We use k = 21 neighbors, this model is based on residuals from GE-2.
13	RBM-1	0.9493	Restricted Boltzmann Machine with discrete input units. The number of hidden units is 10. Learnrate η = 0.002 and regularization λ = 0.0002.
14	RBM-2	0.9123	Restricted Boltzmann Machine with discrete input units. The number of hidden units is 250. Learnrate η = 0.002 and regularization λ = 0.0004.
15	SVD-1	0.9074	A rating matrix factorization with 300 features, trained with stochastic gradient descent, 158 epochs, learn rate η = 8e - 4, regularization λ = 0.01, based on item-mean centered data.
16	SVD-2	0.9172	A rating matrix factorization with 20 features, trained with stochastic gradient descent, 158 epochs, learn rate η = 0.002, regularization λ = 0.02, based on item-mean centered data.
17	SVD-3	0.9033	A rating matrix factorization with 1000 features and adaptive user features (item-item similarity correction). 158 epochs, learn rate η = 0.001, regularization λ = 0.015.
18	SVD-4	0.8871	An extended rating matrix factorization with 150 features. Learnrates and regularizations are automatically tuned in order to minimize the RMSE on the probe set, thus they have individual values per trained parameters.
19	lg(support)	-	The number of ratings given by the user, this is known as support. lg() is the natural logarithm.

Blending

A quick howto to start the blending process.

Download and compile the ELF
Download and extract the Algorithm templates to the directory, where ELF is compiled
Edit the NetflixBlendingTemplates/Master.dsc file and select the blending algorithm
Download the train and the test sets, extract them to NetflixBlendingTemplates/DataFiles/
Start training with ./ELF NetflixBlendingTemplates t
Predict the test set with ./ELF NetflixBlendingTemplates p
Predictions of the test set are stored in a binary file (linear with 4Byte floats) under NetflixBlendingTemplates/TempFiles/testPrediction?.data
The ? sign is a placeholder for the initial random seed

Linear Regression

Linear regression is a linear combination of the predictions. We extend the features with a constant 1 input (addConstantInput=1 in Master.dsc file).

model	prediction type	notes	training time	prediction time	cross validation RMSE	test RMSE
LR - linear regression	Retraining, 8-CV	15 search epochs, λ=4.41454e-06	10[s]	0[s]	0.875522	0.875258
LR - linear regression	CrossFoldMean, 8-CV	15 search epochs, λ=4.41454e-06	10[s]	0[s]	0.875522	0.875257
LR - linear regression	Bagging, size=128	15 search epochs, λ=4.83746e-05	182[s]	4[s]	0.875521 (oob)	0.875258

Polynomial Regression

Polynomial regression is a linear combination of an extended feature space. The extention is generated by a polynomial series. We extend the features with a constant 1 input (addConstantInput=1 in Master.dsc file).

model	prediction type	notes	training time	prediction time	cross validation RMSE	test RMSE
PR - polynomial regression	Retraining, 8-CV	15 search epochs, λ=0.000488289, polyOrder=1, crossInteractions=yes	361[s]	3[s]	0.874246	0.873983
PR - polynomial regression	CrossFoldMean, 8-CV	15 search epochs, λ=0.000488289, polyOrder=1, crossInteractions=yes	352[s]	21[s]	0.874246	0.873976
PR - polynomial regression	Bagging, size=128	15 search epochs, λ=0.000298225, polyOrder=1, crossInteractions=yes	6687[s]	334[s]	0.874241 (oob)	0.87398

Gradient Boosted Decision Tree

GBDT, see here.

model	prediction type	notes	training time	prediction time	cross validation RMSE	test RMSE
GBDT - gradient boosted decision tree	Retraining, 8-CV	119 epochs, featureSubspaceSize=3, maxTreeLeafes=100, η=0.1, optSplitPoint=no	364[s]	8[s]	0.875157	0.875043
GBDT - gradient boosted decision tree	CrossFoldMean, 8-CV	156 epochs, featureSubspaceSize=3, maxTreeLeafes=100, η=0.1, optSplitPoint=no	469[s]	120[s]	0.875115	0.874231
GBDT - gradient boosted decision tree	Bagging, size=128	287 epochs, featureSubspaceSize=3, maxTreeLeafes=100, η=0.1, optSplitPoint=no	8554[s]	7089[s]	0.874215 (oob)	0.873927
GBDT - gradient boosted decision tree	Bagging, size=128	254 epochs, featureSubspaceSize=4, maxTreeLeafes=100, η=0.1, optSplitPoint=no	8816[s]	6331[s]	0.874184 (oob)	0.873916
GBDT - gradient boosted decision tree	Bagging, size=128	229 epochs, featureSubspaceSize=5, maxTreeLeafes=100, η=0.1, optSplitPoint=no	9081[s]	5814[s]	0.874165 (oob)	0.87389
GBDT - gradient boosted decision tree	Bagging, size=128	215 epochs, featureSubspaceSize=6, maxTreeLeafes=100, η=0.1, optSplitPoint=no	9534[s]	5481[s]	0.87415 (oob)	0.873888
GBDT - gradient boosted decision tree	Bagging, size=128	210 epochs, featureSubspaceSize=7, maxTreeLeafes=100, η=0.1, optSplitPoint=no	10468[s]	5442[s]	0.874142 (oob)	0.873871
GBDT - gradient boosted decision tree	Bagging, size=128	201 epochs, featureSubspaceSize=8, maxTreeLeafes=100, η=0.1, optSplitPoint=no	11029[s]	5221[s]	0.874148 (oob)	0.873869
GBDT - gradient boosted decision tree	Bagging, size=128	191 epochs, featureSubspaceSize=9, maxTreeLeafes=100, η=0.1, optSplitPoint=no	11667[s]	5004[s]	0.0.874134 (oob)	0.873865
GBDT - gradient boosted decision tree	Bagging, size=128	179 epochs, featureSubspaceSize=10, maxTreeLeafes=100, η=0.1, optSplitPoint=no	11811[s]	4690[s]	0.874128 (oob)	0.873869
GBDT - gradient boosted decision tree	Bagging, size=128	182 epochs, featureSubspaceSize=11, maxTreeLeafes=100, η=0.1, optSplitPoint=no	13095[s]	4820[s]	0.874128 (oob)	0.873864
GBDT - gradient boosted decision tree	Bagging, size=128	155 epochs, featureSubspaceSize=20, maxTreeLeafes=100, η=0.1, optSplitPoint=no	18138[s]	4255[s]	0.874114 (oob)	0.873859
GBDT - gradient boosted decision tree	Bagging, size=128	99 epochs, featureSubspaceSize=20, maxTreeLeafes=200, η=0.1, optSplitPoint=no	12400[s]	3142[s]	0.874153 (oob)	0.873899
GBDT - gradient boosted decision tree	Bagging, size=128	255 epochs, featureSubspaceSize=20, maxTreeLeafes=50, η=0.1, optSplitPoint=no	26235[s]	5075[s]	0.874103 (oob)	0.873842
GBDT - gradient boosted decision tree	Bagging, size=128	393 epochs, featureSubspaceSize=20, maxTreeLeafes=30, η=0.1, optSplitPoint=no	35390[s]	5928[s]	0.874112 (oob)	0.873846
GBDT - gradient boosted decision tree	Bagging, size=32	41 epochs, featureSubspaceSize=20, maxTreeLeafes=500, η=0.1, optSplitPoint=yes	8944[s]	409[s]	0.874924 (oob)	0.8746
GBDT - gradient boosted decision tree	Bagging, size=32	40 epochs, featureSubspaceSize=10, maxTreeLeafes=500, η=0.1, optSplitPoint=yes	4713[s]	406[s]	0.874819 (oob)	0.874522
GBDT - gradient boosted decision tree	Bagging, size=32	41 epochs, featureSubspaceSize=8, maxTreeLeafes=500, η=0.1, optSplitPoint=yes	4014[s]	425[s]	0.874841 (oob)	0.874504
GBDT - gradient boosted decision tree	Bagging, size=32	42 epochs, featureSubspaceSize=7, maxTreeLeafes=500, η=0.1, optSplitPoint=yes	3711[s]	442[s]	0.874767 (oob)	0.874481
GBDT - gradient boosted decision tree	Bagging, size=32	41 epochs, featureSubspaceSize=6, maxTreeLeafes=500, η=0.1, optSplitPoint=yes	3028[s]	420[s]	0.874814 (oob)	0.874429
GBDT - gradient boosted decision tree	Bagging, size=32	43 epochs, featureSubspaceSize=5, maxTreeLeafes=500, η=0.1, optSplitPoint=yes	2806[s]	454[s]	0.874736 (oob)	0.874432
GBDT - gradient boosted decision tree	Bagging, size=32	44 epochs, featureSubspaceSize=4, maxTreeLeafes=500, η=0.1, optSplitPoint=yes	2432[s]	475[s]	0.874775 (oob)	0.874405
GBDT - gradient boosted decision tree	Bagging, size=32	45 epochs, featureSubspaceSize=3, maxTreeLeafes=500, η=0.1, optSplitPoint=yes	2010[s]	481[s]	0.874774 (oob)	0.874381
GBDT - gradient boosted decision tree	Bagging, size=32	48 epochs, featureSubspaceSize=2, maxTreeLeafes=500, η=0.1, optSplitPoint=yes	1525[s]	525[s]	0.874784 (oob)	0.874377
GBDT - gradient boosted decision tree	Bagging, size=32	64 epochs, featureSubspaceSize=1, maxTreeLeafes=500, η=0.1, optSplitPoint=yes	1741[s]	874[s]	0.874838 (oob)	0.874427
GBDT - gradient boosted decision tree	Bagging, size=32	58 epochs, featureSubspaceSize=2, maxTreeLeafes=300, η=0.1, optSplitPoint=yes	2249[s]	612[s]	0.874783 (oob)	0.87437
GBDT - gradient boosted decision tree	Bagging, size=32	63 epochs, featureSubspaceSize=2, maxTreeLeafes=200, η=0.1, optSplitPoint=yes	2620[s]	611[s]	0.874767 (oob)	0.874399
GBDT - gradient boosted decision tree	Bagging, size=32	94 epochs, featureSubspaceSize=2, maxTreeLeafes=100, η=0.1, optSplitPoint=yes	5009[s]	979[s]	0.874934 (oob)	0.874546
GBDT - gradient boosted decision tree	Bagging, size=32	118 epochs, featureSubspaceSize=2, maxTreeLeafes=300, η=0.05, optSplitPoint=yes	4362[s]	1361[s]	0.87467 (oob)	0.874352
GBDT - gradient boosted decision tree	Bagging, size=32	196 epochs, featureSubspaceSize=2, maxTreeLeafes=300, η=0.03, optSplitPoint=yes	6978[s]	2206[s]	0.874624 (oob)	0.87433
GBDT - gradient boosted decision tree	Bagging, size=32	305 epochs, featureSubspaceSize=2, maxTreeLeafes=300, η=0.02, optSplitPoint=yes	11788[s]	3540[s]	0.874593 (oob)	0.874309
GBDT - gradient boosted decision tree	Bagging, size=32	605 epochs, featureSubspaceSize=2, maxTreeLeafes=300, η=0.01, optSplitPoint=yes	22811[s]	6849[s]	0.8746 (oob)	0.874322
GBDT - gradient boosted decision tree	Bagging, size=32	1171 epochs, featureSubspaceSize=2, maxTreeLeafes=300, η=0.005, optSplitPoint=yes	40364[s]	12545[s]	0.874597 (oob)	0.874326
GBDT - gradient boosted decision tree	Bagging, size=16	274 epochs, featureSubspaceSize=2, maxTreeLeafes=300, η=0.02, optSplitPoint=yes	3951[s]	1400[s]	0.874936 (oob)	0.874381
GBDT - gradient boosted decision tree	Bagging, size=64	315 epochs, featureSubspaceSize=2, maxTreeLeafes=300, η=0.02, optSplitPoint=yes	25219[s]	7350[s]	0.874554 (oob)	0.874293
GBDT - gradient boosted decision tree	Bagging, size=128	319 epochs, featureSubspaceSize=2, maxTreeLeafes=300, η=0.02, optSplitPoint=yes	52805[s]	15070[s]	0.874517 (oob)	0.874288

Neural Networks

Neural networks are non-linear function approximators, see here.

model	prediction type	notes	training time	prediction time	cross validation RMSE	test RMSE
NN - neural network	Retraining, 8-CV	1005 epochs, Net: 19-30-1, η=5e-4, η^-=5e-7, λ=0	5413[s]	1[s]	0.873633	0.873365 (qual 0.866345 3[s])
NN - neural network	CrossFoldMean, 8-CV	1005 epochs, Net: 19-30-1, η=5e-4, η^-=5e-7, λ=0	3163[s]	5[s]	0.873633	0.873316 (qual 0.866316 18[s])
NN - neural network	CrossFoldMean, 8-CV	996 epochs, Net: 19-50-1, η=5e-4, η^-=5e-7, λ=0	15162(core2)[s]	11(core2)[s]	0.873617	0.87325
NN - neural network	CrossFoldMean, 8-CV	996 epochs, Net: 19-100-1, η=5e-4, η^-=5e-7, λ=0	26129(core2)[s]	16(core2)[s]	0.873523	0.873255
NN - neural network	CrossFoldMean, 8-CV	996 epochs, Net: 19-200-1, η=5e-4, η^-=5e-7, λ=0	51858(core2)[s]	32(core2)[s]	0.873602	0.87326
NN - neural network	Bagging, size=32	1009 epochs, Net: 19-30-1, η=5e-4, η^-=5e-7, λ=0	15568[s]	18[s]	0.87347	0.873191 (qual 0.866215 74[s])
NN - neural network	Bagging, size=128	1010 epochs, Net: 19-30-1, η=5e-4, η^-=5e-7, λ=0	62419[s]	73[s]	0.873436	0.873185
NN - neural network	Bagging, size=128	980 epochs, Net: 19-70-1, η=5e-4, η^-=5e-7, λ=0	121049[s]	126[s]	0.87342	0.873163 (qual 0.866169 524[s])
NN - neural network	Bagging, size=128	793 epochs, Net: 19-150-1, η=5e-4, η^-=5e-7, λ=0	236993[s]	245[s]	0.873473	0.873169 (qual 0.866204 981[s])
NN - neural network	Bagging, size=128	961 epochs, Net: 19-30-50-1, η=5e-4, η^-=5e-7, λ=0	164416[s]	155[s]	0.873487	0.873174 (qual 0.866261 622[s])
NN - neural network	Bagging, size=128	908 epochs, Net: 19-50-30-1, η=5e-4, η^-=5e-7, λ=0	175132[s]	167[s]	0.873455	0.87318 (qual 0.866253 703[s])
NN - neural network	Bagging, size=128	771 epochs, Net: 19-70-50-1, η=5e-4, η^-=5e-7, λ=0	292926[s]	240[s]	0.873474	0.873217 (qual 0.866236 984[s])

Combination of NN+PR+GBDT

We use the out of bag estimate as error measure for the ensemble accuracy. Here are the dsc templates.

order	model	prediction type	notes	training time	prediction time	cross validation RMSE	blend RMSE	test RMSE
1	NN - neural network	Bagging, size=128	870 epochs, Net: 19-100-1, η=0.0005, η^-=5e-7, λ=0, scale=3.6, offset=3.0	159973[s]	[s]	0.87345 (oob)	0.873445
2	GBDT - gradient boosted decision tree	Bagging, size=128	226 epochs, featureSubspaceSize=20, maxTreeLeafes=50, η=0.1, optSplitPoint=no	23834[s]	[s]	0.874111 (oob)	0.873387
3	GBDT - gradient boosted decision tree	Bagging, size=128	267 epochs, featureSubspaceSize=2, maxTreeLeafes=300, η=0.02, optSplitPoint=yes	29168[s]	[s]	0.874603 (oob)	0.873384
4	PR - polynomial regression	Bagging, size=128	15 search epochs, λ=2.4e-06, polyOrder=1, crossInteractions=yes	6674[s]	[s]	0.874358 (oob)	0.87336
5	PR - polynomial regression	Bagging, size=128	15 search epochs, λ=0.054, polyOrder=3, crossInteractions=no	1076[s]	[s]	0.895951 (oob)	0.873351
6	NN - neural network	Bagging, size=128	998 epochs, Net: 19-100-1, η=0.0005, η^-=5e-7, λ=0, scale=2.0, offset=3.0	169560[s]	[s]	0.87345 (oob)	0.873296
7	NN - neural network	Bagging, size=128	952 epochs, Net: 19-50-30-1, η=0.0005, η^-=5e-7, λ=0, scale=2.0, offset=3.0	179280[s]	16200[s]	0.873449 (oob)	0.873227	blend: 0.87297