Model Configurations: Convolutional Neural Network (CNN)

When you select the Convolutional Neural Network modeling method Model > Configurations, the Model Configuration area of the <output> tab contains the following elements. For each output there is a separate tab.

See Model Configurations (ASCMO-DYNAMIC), for a description of the Output Properties area and the button row on the bottom of the window.

Training Labels

Assign the labels you want to train the model on. If you use multiple labels, all data associated with at least one of the labels is used.

Assign a label by double-clicking the field and typing the name. Select the dataset from the list of suggestions.

Use the x on the label or Del to remove the label.

Validation Labels

Assign the labels of the data you want to use as validation data. If you use multiple labels, all data associated with at least one of the labels is used.

Assign a label by double-clicking the field and typing the name. Select the dataset from the list of suggestions.

Use the x on the label or Del to remove the label.

Note
If the validation labels are not assigned to any data, the model is trained without validation. A message appears in the log window. You can assign labels to data in the Manage Datasets window.

Network Layout

Layer Type

Select between 1D Convolution and Temporal Convolution cell type.

Number of Layers

Enter the number of layers. Depending on the number of layers, corresponding input fields are generated for the next 4 options.

Number of Filters

Specify the number of filters for each layer separately.

Kernel Size

Specify the kernel size for each layer separately.

Note
The kernel size and dilation rate settings determine the length of the model's receptive field, i.e., how far into the past the model can look to compute the prediction for the current time step. See also, Model Prediction with Convolutional Neural Network (CNN).

Dilation Rate

Specify the dilation rate for each layer. Each temporal convolutional layer uses dilated Conv1d layers with dilation rate r (i.e., every rth element of the input is used for convolution). For example, if you want to use the entire input sequence, you can use r=bi-1 for the ith layer, where b is a natural number ≥ 2. The default is r=1 for each layer.

Note
The kernel sizes and dilation rates settings determine the length of the model's receptive field, i.e., how far into the past the model can look to compute the prediction for the current time step. See also, Model Prediction with Convolutional Neural Network (CNN).

Leaky ReLU Slope

Specify the Leaky ReLU activation function of the CNN layers. The slope for negative inputs can be set in the interval from 0 (corresponding to ReLU activation) to 1 (corresponding to linear activation).

Opens a plot of the activation function with the current slope values. If different values are entered in the fields, one function is plotted for each different value.

Dropout

Enter a value between 0 and 0.9 as a percentage of neurons to be randomly deactivated during training. That means the weight of the neurons is set to 0 temporarily. This avoids overfitting. The entered value 0 means not activated, 0.1 means 10% of the neurons are set to 0 temporarily. The typical value range is [0, 0.2].

Use Skip Connections

Activate to add the input of the layer to its output.

If the input and output sizes do not match, the input is processed by an additional convolutional layer before being added to the output. Skip connections have been shown to simplify training and improve network performance by allowing the flow of information to skip selected layers.

Closed More information

Receptive Field

Displays the number of past time steps considered by the model. The value is displayed dynamically depending on the values for kernel sizes and dilation rates. A value k means that inputs from the interval [t-(k-1),t] (and possibly not all of them) are used to compute the output at time step t.

Opens a Closed schematic representation of the receptive field generated for the specified kernel size and dilation rate.

Number of Network Parameters

Dynamically shows the number of parameters used by the model training for current settings (including potential initialization network).

Output Properties

Output Transformation

Select the transformation type of the output. Using a transformation can improve the model prediction. Not all transformations are available if the training data has negative or zero values.

You can select from the following choices:

none: no transformation
log(y): logarithm
Bounded: limited to lower and upper bound

Click Edit to view the automatically selected bounds or to define the lower and upper bounds manually. To define them manually, deactivate the Automatic checkbox. The bounds must be in the range of the training data.
log(y+c): logarithm plus constant

Click Edit to view the automatically selected log shift or to define a manual shift value. To define it manually, deactivate the Automatic checkbox.

Training Properties

Continue Training

Activate the checkbox to continue with existing model training and iterations, if possible, instead of starting a new training. You can change the training properties and continue. The network layout must remain the same. For example, train with a complex activation function, then switch to a more efficient one (for the ECU) and continue training seamlessly. Continue Training works best with the Stochastic Gradient Descent as optimizer (Manual Detailed Training Settings > Optimizer), which is automatically selected when Continue Training is activated.

Number of Multistarts

Enter the number of training repetitions with different starting values. A higher value can improve the model quality, but the model training then takes more time. The default value is 3.

Number of Iterations

Enter the number of iterations to be performed during model training. If the model performance does not improve within 10 iterations on the validation data, the training will be aborted. In deep learning this is often referred to as number of epochs.

Training Loss

Select the type of training loss. Training loss is the criterion that is minimized during model training.

Absolute equals the RMSE calculation:

Relative puts this in relation to the measured value:

Transformed behaves like Absolute, but performs the loss calculation in the transformed space, i.e. it depends on the output transformation.

Snippet Length

Enter the size of the sequences into which the data is segmented. The underlying optimizer then gets batches with sequences of this fixed length, calculated as the sum of the entered value and the length of the receptive field. The default value for the snippet length is 50.

If the length of the current data is about this number of steps or less, it is recommended that you reduce the snippet length, although an optimization with "Snippet Length" = 50 is possible. A smaller value trains faster.

Plot RMSE during Training

Activate if you want the RMSE values for training data and validation data to be Closed displayed during model training.

Activate Detailed Training Settings

Activate the checkbox to display the Closed Detailed Training Settings section.

Optimizer

Select the optimizer used to train the model. If you activate the Continue Training checkbox, it is recommended to select Stochastic Gradient Descent (for continue).

Data Scaling

Select the scaling variant for the model training.

Before the model training, the training data is scaled internally so that the different input signals are mapped to a similar range. ASCMO-DYNAMIC offers three different scaling types.

Normalization:

scaled to an interval of length one contained in [-1, 1].

Robust Normalization:

This scaling method is preferable whenever there are outliers in the data.

Standardization: (Z-Score normalization)

The detailed training settings are adjusted for each iteration. For the first iteration the Start Value is used, for the last iteration the Final Value. The values in between are interpolated.

No. of Optimizer Substeps

Determines how many sequences of length Lookback Length are used for one optimizer update. The default value is 100, which is the batch size used in deep learning. The larger the value, the smaller the batch size and vice versa. If the number is small, the optimizer step is performed less frequently and the training is therefore faster.
Steps Between Training Snippets

Set the steps between training snippets. The value determines the size of the steps between the start positions of the snippets selected for the model training. So not all possible start positions for training snippets are taken, but every nth position as start position. A larger value results in faster model training. The value 1 results in the best model. Values larger than 1 (e.g. 10) reduce memory consumption, but more iterations may be required.
Learning Rate

Enter the size of the optimizer steps. The default value is 0.01. Valid value range is [0, 1].

The larger the learning rate, the faster the training generally will be. However, convergence can be hindered, or even prevented, by large learning rates.

Highlight Model Deviations

If activated the model deviation will be highlighted in red in Closed the plot .

Further settings for the highlighting are displayed:

Anomaly Percentile

Enter the percentile of reconstruction errors that are considered normal. If validation datasets are used for model selection, this value is calculated based on validation data, otherwise based on training data. The corresponding value is mapped to 0.5 in the anomaly score.

Smoothing

Enter the window size value for data points as steps for a median filter. For the prediction of an anomaly, the signals can be smoothed. This can lead to better results.

Smoothing Window Centered

Activate if you want the smoothing window to be centered around the evaluation point, i.e. future points are also taken into account. If deactivated, only past points are used for calculations in the smoothing window.

Rounding Threshold

Enter an anomaly prediction threshold below which the model prediction is automatically set to 0, and 1 otherwise. The threshold is displayed in the Receiver Operating Characteristic (Model > Anomaly Detection: Receiver Operating Characteristic).

Inputs used in Model Training

This area lists all inputs of the model. You can activate/deactivate the checkbox of an input to include/exclude it in the model training.

By default, all inputs are included.


Skip connection for a block of layers where the number of inputs is equal to the number of outputs. The neural network block can consist of only one layer, e.g., one LSTM layer.	Skip connection for a block of layers where the number of inputs differs from the number of outputs. The neural network block can consist of only one layer, e.g., one LSTM layer. A 1D convolution with linear activation is used to adjust the input to the output dimension of the skipped layer.