Data preparation for NeurEco Classification with the command line interface#

The command line interface expects the data for model construction or evaluation in form of paths to files containing the data.

  • The supported formats are:

    • CSV with “;” or “,” separator;

    • NumPy .npy

    • MATLAB MAT-files .mat

  • Files contain the numerical data, allowed types: int, float, double

  • Any input file contains a table with:

    • number of lines equal to a number of samples

    • number of columns equal to a number of input features

    • CSV files could have one additional line for a header

  • Any output file contains a table with:

    • number of lines equal to a number of samples

    • number of columns equal to a number of output features, for Classification these features are the classes

    • the outputs are one-hot encoded: each line contains ‘0’ on all positions, except for one containing ‘1’. This position corresponds to a class to which belongs the sample on the line.

    • CSV files could have one additional line for a header

  • input file and the corresponding output file have the same number of samples

  • The data can be provided in chunks, in multiple input and output files. In this case pay attention to preserving the correspondence between input and output files

There is no need to normalize the data, as the normalization is handled by NeurEco, Data normalization for Tabular Regression.