ELF - "Ensemble Learning Framework"

1

Introduction

This piece of software, written in C++, is constructed being a stand-alone supervised machine learning framework. ELF is able to solve regression as well as classification problems. Our goal is doing this as accurate and as fast as possible. Recently, ensemble learning turns out to be very popular. The ELF supports ensembling in some ways.

The optimization target can be the RMSE, MAE, AUC or the classification error. The ELF supports multi-class classification problems and multi-domain classification. Multi-domain means that we have more than one label per example. The ELF has well implemented base learners. The ensemble functionality of the framework is realized with stacking, cascade learning and residual training. Stacking is a simple linear combination. ELF has the opportunity of cascade learning, which is an extention of the features with predictions from other models. Parameter selection can be done with cross-validation or bagging. Parameter selection means searching for good metaparameters in order to control overfitting of each individual model.

Features

The Framework
- Features and targets are stored as matrices (no sparse features possible!)
- The size of the feature matrix is limited to 2³² elements
- Floating point precision can be single (4Byte) or double (8Byte)
- Training speed is a main design goal
- Accuracy is a main design goal
- Many learners use Intel's performance libraries: IPP and MKL (both come with Intel C++ compiler package)
Regression
- All learners solve a regression problem
- Any classification problem is transferd to a regression problem!
- RMSE or MAE as optimization target
Classification
- RMSE or AUC (2-class) or classification error as optimization target
- Multi-class classification (>2 classes)
- Multi-domain classification (>1 labels)
Parameter selection - control overfitting
- k-fold Cross Validation
- Bagging (out-of-bag estimate)
- Cross Validation and Bagging are parallelized via OpenMP
Base Learners
- Linear Regression
- Polynomial Regression
- K-Nearest Neighbors
- Neural Networks
- Gradient Boosted Decision Trees
- Kernel Ridge Regression
Ensemble Learning
- Stacking (linear combination, lowering the RMSE)
- Training on Residuals
- Cascade Learning
- Bagging (from parameter selection)
Prediction
- 3 Modes
  - Retraining - Re-Train the model with best metaparameters on all avaliable data
  - CrossFoldMean - Average prediction of k models from the k-fold cross validation
  - Bagging - Average prediction of k models from the bagging setup
- All model parameters and combination weights are stored in binary files
- Ready-to-predict state. Making predictions of arbitrary new test features from the whole ensemble

Usage

The configuration of the learners (which algorithms and which parameters) is dataset dependent. A separate directory structure for each dataset is needed. In this directory, a Master.dsc file must exist. The file ending *.dsc means this file is a description file (text file).

File-Format

This section gives a short overview of the dsc-file-format. Lines, which begin with # are comments. A value can be set with e.g. value=1.23, no spaces between the "=" sign. The Master.dsc file must begin in the first line with e.g. dataset=MNIST. The second line must be e.g. isClassificationDataset=1. Then a couple of values are set. The algorithms in their train order must be specified after a line, which contains [ALGORITHMS]. In the algortihm's dsc-files (e.g. NeuralNetwork_1.dsc) there are sections for each variable type. The types are [int], [double], [bool] and [string]. The first line in an algorithm's description file is e.g. ALGORITHM=LinearModel. The second line is e.g. ID=1, the ids in the algorithms are ascending, beginning with 1. The third line can be e.g. TRAIN_ON_FULLPREDICTOR=NeuralNetwork_1.dat, which means that the current algorithm trains on the residuals which is stored in the NeuralNetwork_1.dat file. The fourth line can be e.g. DISABLE=1, which assumes that this algorithm is already trained (is helpful when train ensembles).

Starting

The data set name and one of these characters [t,b,p] must be specified in the console in order to start the training, blending or prediction. Blending means to combine existing results (e.g. calculate linear regression coefficients). The following examples shows the usage.
$ ./ELF LETTER t : start training
$ ./ELF LETTER b : start blending existing algorithms (seldom used)
$ ./ELF LETTER p : start prediction

The ELF starts training all specified algorithms sequentially when ./ELF dataset t is entered into your console. The algorithms are set in the Master.dsc file after the line [ALGORITHMS]. By calling ./ELF dataset p the ELF starts to predict the predifined testset of the dataset.