Model Configurations: Recurrent Neural Network (RNN)

Model > Configurations

When you select the Recurrent Neural Network modeling method, the Model Configuration area of the <output> tab contains the following elements. For each output there is a separate tab.

See Model Configurations (ASCMO-DYNAMIC), for a description of the Output Properties area and the button row on the bottom of the window.

Training Labels

Assign the labels you want to train the model on. If you use multiple labels, all data associated with at least one of the labels is used.

Assign a label by double-clicking the field and typing the name. Select the dataset from the list of suggestions.

Use the x on the label or Del to remove the label.

Validation Labels

Assign the labels of the data you want to use as validation data. If you use multiple labels, all data associated with at least one of the labels is used.

Assign a label by double-clicking the field and typing the name. Select the dataset from the list of suggestions.

Use the x on the label or Del to remove the label.

Note
If the validation labels are not assigned to any data, the model is trained without validation. A message appears in the log window. You can assign labels to data in the Manage Datasets window.

Network Layout

Layer Type

Select the type of network cells. See, Model Predictions with Recurrent Neural Networks (RNN).

Number of Layers

Enter the number of layers. Depending on the number of layers, corresponding Cell Memory Size input fields are generated.

Cell Memory Size

Enter the cell memory size per node of the neural network. Typical value range is [5, 20]. The actual memory size depends on the cell type.

Activation Function

Select the activation function for all layers. Default is Tanh, a different function may be less accurate but more efficient to calculate. Click to the right of the drop-down to open the Closed Activation Function Comparison window.

Recurrent Activation Function

Select the recurrent activation function for all layers. Default is Sigmoid, a different function my be less accurate but more efficient to calculate. Click to the right of the drop-down to open the Closed Activation Function Comparison window.

Use Skip Connections

Activate to add the input of the layer to its output.

If the input and output sizes do not match, the input is processed by an additional convolutional layer before being added to the output. Skip connections have been shown to simplify training and improve network performance by allowing the flow of information to skip selected layers.

Closed More information

Number of Network Parameters

Dynamically shows the number of parameters used by the model training for current settings (including potential initialization network).

Output Properties

Output Transformation

Select the transformation type of the output. Using a transformation can improve the model prediction. Not all transformations are available if the training data has negative or zero values.

You can select from the following choices:

none: no transformation
log(y): logarithm
Bounded: limited to lower and upper bound

Click Edit to view the automatically selected bounds or to define the lower and upper bounds manually. To define them manually, deactivate the Automatic checkbox. The bounds must be in the range of the training data.
log(y+c): logarithm plus constant

Click Edit to view the automatically selected log shift or to define a manual shift value. To define it manually, deactivate the Automatic checkbox.

Training Properties

Continue Training

Activate the checkbox to continue with existing model training and iterations, if possible, instead of starting a new training. You can change the training properties and continue. The network layout must remain the same. For example, train with a complex activation function, then switch to a more efficient one (for the ECU) and continue training seamlessly. Continue Training works best with the Stochastic Gradient Descent as optimizer (Manual Detailed Training Settings > Optimizer), which is automatically selected when Continue Training is activated.

Number of Multistarts

Enter the number of training repetitions with different starting values. A higher value can improve the model quality, but the model training then takes more time. The default value is 3.

Number of Iterations

Enter the number of iterations to be performed during model training. If the model performance does not improve within 10 iterations on the validation data, the training will be aborted. In deep learning this is often referred to as number of epochs.

Training Loss

Select the type of training loss. Training loss is the criterion that is minimized during model training.

Absolute equals the RMSE calculation:

Relative puts this in relation to the measured value:

Transformed behaves like Absolute, but performs the loss calculation in the transformed space, i.e. it depends on the output transformation.

Lookback Length

Enter the size of sequences into which the data is segmented. The underlying optimizer then gets batches with sequences of this fixed length. The default value is 100.

If the length of the present data is about this number of steps, or less, it is recommended that you reduce the lookback length, even though an optimization with Lookback Length=100 is possible. A smaller value trains faster.

Dropout

Enter a value between 0 and 0.9 as a percentage of neurons to be randomly deactivated during training. That means the weight of the neurons is set to 0 temporarily. This avoids overfitting. The entered value 0 means not activated, 0.1 means 10% of the neurons are set to 0 temporarily. The typical value range is [0, 0.2].

Recurrent Dropout

Enter a value between 0 and 0.9 as a percentage of recurrent state neurons to be deactivated during training. That means the weight of the neurons is set to 0 temporarily. This avoids overfitting. The entered value 0 means not activated, 0.1 means 10% of the neurons are set to 0 temporarily. The typical value range is [0, 0.2].

Learn Initial States

If this checkbox is activated, the initial state of RNN model is learned based on the input and output values of the first time step, so the initial prediction matches the given output.

When the model is exported, you can define a value as desired initial output.

Plot RMSE during Training

Activate if you want the RMSE values for training data and validation data to be Closed displayed during model training.

Activate Detailed Training Settings

Activate the checkbox to display the Closed Detailed Training Settings section.

Optimizer

Select the optimizer used to train the model. If you activate the Continue Training checkbox, it is recommended to select Stochastic Gradient Descent (for continue).

Data Scaling

Select the scaling variant for the model training.

Before the model training, the training data is scaled internally so that the different input signals are mapped to a similar range. ASCMO-DYNAMIC offers three different scaling types.

Normalization:

scaled to an interval of length one contained in [-1, 1].

Robust Normalization:

This scaling method is preferable whenever there are outliers in the data.

Standardization: (Z-Score normalization)

The detailed training settings are adjusted for each iteration. For the first iteration the Start Value is used, for the last iteration the Final Value. The values in between are interpolated.

No. of Optimizer Substeps

Determines how many sequences of length Lookback Length are used for one optimizer update. The default value is 100, which is the batch size used in deep learning. The larger the value, the smaller the batch size and vice versa. If the number is small, the optimizer step is performed less frequently and the training is therefore faster.
Steps Between Training Snippets

Set the steps between training snippets. The value determines the size of the steps between the start positions of the snippets selected for the model training. So not all possible start positions for training snippets are taken, but every nth position as start position. A larger value results in faster model training. The value 1 results in the best model. Values larger than 1 (e.g. 10) reduce memory consumption, but more iterations may be required.
Learning Rate

Enter the size of the optimizer steps. The default value is 0.01. Valid value range is [0, 1].

The larger the learning rate, the faster the training generally will be. However, convergence can be hindered, or even prevented, by large learning rates.

Highlight Model Deviations

If activated the model deviation will be highlighted in red in Closed the plot .

Further settings for the highlighting are displayed:

Anomaly Percentile

Enter the percentile of reconstruction errors that are considered normal. If validation datasets are used for model selection, this value is calculated based on validation data, otherwise based on training data. The corresponding value is mapped to 0.5 in the anomaly score.

Smoothing

Enter the window size value for data points as steps for a median filter. For the prediction of an anomaly, the signals can be smoothed. This can lead to better results.

Smoothing Window Centered

Activate if you want the smoothing window to be centered around the evaluation point, i.e. future points are also taken into account. If deactivated, only past points are used for calculations in the smoothing window.

Rounding Threshold

Enter an anomaly prediction threshold below which the model prediction is automatically set to 0, and 1 otherwise. The threshold is displayed in the Receiver Operating Characteristic (Model > Anomaly Detection: Receiver Operating Characteristic).

Inputs used in Model Training

This area lists all inputs of the model. You can activate/deactivate the checkbox of an input to include/exclude it in the model training.

By default, all inputs are included.


Skip connection for a block of layers where the number of inputs is equal to the number of outputs. The neural network block can consist of only one layer, e.g., one LSTM layer.	Skip connection for a block of layers where the number of inputs differs from the number of outputs. The neural network block can consist of only one layer, e.g., one LSTM layer. A 1D convolution with linear activation is used to adjust the input to the output dimension of the skipped layer.