Usage

The configuration of the learners (which algorithms and which parameters) is dataset dependent. A separate directory structure for each dataset is needed. In this directory, a Master.dsc file must exist. The file ending *.dsc means this file is a description file (text file).

Here is an example directory structure for the dataset "LETTER".

The four directories DataFiles, DscFiles, FullPredictorFiles and TempFiles are nedded in order to start the training (the names can be specified in config.h). In the main directory a Master.dsc file must exist. In the main directory, all algorithm description files are there. Here, in the "LETTER" example we have 6 dsc files (GBDT_1.dsc, KernelRidgeRegression_1.dsc, KNearestNeighbor_1.dsc, LinearModel_1.dsc, NeuralNetwork_1.dsc, PolynomialRegression_1.dsc).

Description file format *.dsc

This section gives a short overview of the dsc-file-format. Lines, which begin with # are comments. A value can be set with e.g. value=1.23, no spaces between the "=" sign.

The Master.dsc file must begin in the first line with e.g. dataset=MNIST. The second line must be e.g. isClassificationDataset=1. Then a couple of values are set. The algorithms in their train order must be specified after a line, which contains [ALGORITHMS].

In the algortihm's dsc-files (e.g. NeuralNetwork_1.dsc) there are sections for each variable type. The types are [int], [double], [bool] and [string]. The first line in an algorithm's description file is e.g. ALGORITHM=LinearModel. The second line is e.g. ID=1, the ids in the algorithms are ascending, beginning with 1. The third line can be e.g. TRAIN_ON_FULLPREDICTOR=NeuralNetwork_1.dat, which means that the current algorithm trains on the residuals which is stored in the NeuralNetwork_1.dat file. The fourth line can be e.g. DISABLE=1, which assumes that this algorithm is already trained (is helpful when train ensembles).

Start the ELF

The data set name and one of these characters [t,b,p] must be specified in the console in order to start the training, blending or prediction. Blending means to combine existing results (e.g. calculate linear regression coefficients). The following examples shows the usage.
$ ./ELF LETTER t : start training
$ ./ELF LETTER b : start blending existing algorithms (seldom used)
$ ./ELF LETTER p : start prediction

The ELF starts training all specified algorithms sequentially when ./ELF dataset t is entered into your console. The algorithms are set in the Master.dsc file after the line [ALGORITHMS]. By calling ./ELF dataset p the ELF starts to predict the predifined testset of the dataset.