Reading a dataset

In the class DatasetReader, there are several methods for reading different datasets. Each of the methods has the same set of parameters. The goal is to fill the matrices with the values from the dataset.


void DatasetReader::fooDataset ( string path, REAL* &train, REAL* &trainTarget, int* &trainLabel, REAL* &test, REAL* &testTarget, int* &testLabel, uint& nTrain, uint& nTest, int& nClass, int& nDomain, int& nFeat, REAL positiveTarget, REAL negativeTarget )


The parameters with reference to a value (with "&" sign) should be filled with data. For regression based datasets, the values int* &trainLabel and int* &testLabel are 0.

Description of params

"string path"
The path to the dataset directory, e.g. "./MNIST/DataFiles".


"REAL* &train"
Is a nFeat by nTrain matrix, the matrix is accessed from a single pointer row wise. This must be allocated and filled. This matrix holds the features for training. The allocation is typically done with train = new REAL[nFeat*nTrain];


"REAL* &trainTarget"
Is a nClass*nDomain by nTrain matrix, the matrix is accessed from a single pointer row wise. This must be allocated and filled. This matrix holds the numeric targets, for classification problems use positiveTarget and negativeTarget as encoding values.


"int* &trainLabel"
Is a nDomain by nTrain matrix, the matrix is accessed from a single pointer row wise. This must be allocated and filled when having a classification task. Each element represents the label (0..nClass-1).


"REAL* &test"
The same for the test set, see explaination to "REAL* &train". A nFeat by nTest matrix.


"REAL* &testTarget"
The same for the test set, see explaination to "REAL* &trainTarget". A nClass*nDomain by nTest matrix.


"int* &testLabel"
The same for the test set, see explaination to "int* &trainLabel". A nDomain by nTest matrix.


"uint& nTrain"
The number of samples in the training set.


"uint& nTest"
The number of samples in the test set.


"int& nClass"
For a regression problem: nClass=1. In a classification problem, it denotes the number of different labels.


"int& nDomain"
For a regression problem: nDomain=1. In a classification problem, it denotes the number of different domains. Standard value is 1.


"int& nFeat"
The number of features in the training and test set.


"REAL positiveTarget"
Positive encoding value for targets in the trainTarget and testTarget matrices. Only used in classification problems.


"REAL negativeTarget"
Negative encoding value for targets in the trainTarget and testTarget matrices. Only used in classification problems.