Overview: Model Type Descriptions

Modeling Method

Model/Cell Type

Description

Use for

Pros

Cons

Static Modeling

(systems without inner states, time-independent or steady-state)

ASC GP

(ASCMO Gaussian Process)

  • Gaussian Process, a probabilistic regression model.

  • One basis function per training data point at the location of the training data point.

  • If the number of training data points is greater than max. no. of basis functions, a subset of points is selected, either randomly or using the GP SCS method. In this case, the Subset of Regressors algorithm is used.

  • The default squared exponential kernel is good for many technical systems because the effect of an input on the output is damped with a quadratic distance. For systems with higher fidelity, the maternal kernel can be selected.

  • This default model should be used when the number of training data and the number of inputs are small to medium.

  • For 15 inputs, do not use more than 4000 training data points (e.g., use MLP instead).

  • The number of inputs should not exceed 25-30.

  • Good model quality.

  • Good generalization quality with small number of training data.

  • Shows model uncertainty (sigma).

  • Probabilistic training avoids overfitting.

  • Model quality can be accessed with Leave-One-Out error, so all data can be used as training data without the need for validation data.

  • High memory consumption during training (scales quadratically with the number of training data points).

  • Computational load for model prediction (inference) depends on the number of training data points, which can be large.

ASC GP SCS*

(ASCMO Sparse Constant Sigma Gaussian Process)

  • A Sparse Gaussian Process.

  • The number of basis functions can be smaller than the number of training data.

  • Can optimize the positions of the basis functions.

  • Use when a probabilistic model is wanted, but a larger number of training data points are available.

  • The number of basis functions can be chosen and is generally small.

  • Makes model prediction (inference) faster than full GP.

  • Model uncertainty (sigma) not available.

Polynom*

  • Polynomial regression with an nth degree polynomial function.

  • y = a + b * x1 + c * x2 + d * x1 * x2 + e * x12...

  • Interactions (e.g., first order x1*x2) and order (e.g., third order x13) can be selected.

  • Stepwise regression to automatically find relevant terms is the default.

  • Linear systems with few linear interactions between inputs.

  • Extrapolation can be good.

  • Fast and simple.

  • Poor model quality for nonlinear systems with interacting inputs.

MLP

(Multilayer Perceptron)

  • A multilayer neural network.

  • Activation functions can be selected.

  • Many data points (>10,000).

  • Many inputs (>20).

  • Model size can be chosen (number of neurons/layers), so it can be small.

  • Can be trained on a huge number of data.

  • Risk of overfitting, needs validation data to avoid overfitting.

ASC Compressed*

(additional license required)

  • A model that depends on a squared exponential kernel.

  • The number of basis functions can be selected and the position of these basis functions is trained.

  • Use when the model needs to be small (only 20 basis functions).

  • Complex behavior can be predicted.

  • Model prediction (inference) is the same as ASC GP, so this model type can be integrated into the Bosch AMU (Advanced Modeling Unit for hardware-supported fast prediction on Bosch ECUs).

  • Small in comparison to a full Gaussian process.

  • Risk of overfitting, needs validation data to avoid overfitting.

Classification (GP)*

  • A binary classification model.

  • The output channel must be zeros and ones.

  • The model uses a Gaussian process for the classification task.

  • Binary (two-class) classification.

  • Use when there are less than 4000 training data points with 15 inputs.

  • The number of inputs should not be greater than 20-30 inputs.

  • Good generalization with a small number of training data points.

  • Gaussian process prediction (inference) has a higher computational load than other methods.

Classification (MLP)*

  • A binary classification model.

  • The output channel must be zeros and ones.

  • The model uses MLP (Multi Layer Perceptron)for the classification task.

  • Binary (two-class) classification.

  • Works with a large number of training data points.

  • Works with a large number of data, model complexity (memory) can be selected.

  • Risk of overfitting, needs validation data to avoid overfitting.

Classification (Random Decision Trees)

  • A binary classification model.

  • The output channel must be zeros and ones.

  • The model uses decision trees with feature and data bagging for classification.

See, Wikipedia: Decision Tree

and Random Forest

  • Binary (two-class) classification.

  • Works with a large number of training data points.

  • With a robust number of trees, the risk of overfitting is low.
  • If overfitting occurs, the depth of trees can be reduced.
  • Prediction and exported code for prediction is simple.
  • Can be used for a large number of inputs.
  • Model output is not steady (bad for gradient optimization).

Transient Modeling with NARX structure

(The NARX structure allows the use of a static model at the core.)

see Static Modeling

Transient Modeling with Recurrent Neural Network (RNN)

LSTM Cell

(Long short-term memory networks)

  • A neural network, deep learning model, with multiple layers.

  • Activation functions can be selected.

  • Multiple gates allow adding and removing information ("forget gate").

  • The most complex RNN in ASCMO, with five gates per layer, activation and recurrent activation functions.

See, Wikipedia: Long short-term memory.

  • Use when learning long-term behavior.

  • Good model quality.

  • Higher complexity than other RNN types.

LSTM Cell without Peepholes

  • Same as the LSTM cell, but with fewer connections (the gate layers cannot see the cell state).

  • This was the first iteration of the LSTM idea.

  • Use when learning long-term behavior.

 

  • Might be worse than a full LSTM cell.

GRU Cell

(Gated Recurrent Unit)

  • Medium complex RNN cell with 3 gates.

See, Wikipedia: Gated recurrent unit.

  • Use when learning long-term behavior.

  • Medium complexity, but can reach LSTM model quality.

 

GRU Cell (Reset After)

  • Reversed order of reset and update gate compared to standard GRU.

  • Use when learning long-term behavior.

 

 

RNN Cell

  • Most basic RNN cell with only one gate (only one activation function).

  • Use when only short-term effects are important and system complexity is low.

  • Low complexity.

  • Short-term effects only.

Convolutional Neural Network (CNN)

1D Convolution

  • Deep neural network with convolution.

See, Wikipedia: Convolutional neural network.

  • Alternative to RNN (LSTM cell).

  • Noise resistant.

  • In addition to features from the time dimension, it can also learn deep (spatial) features of the data.

  • Recovers from an invalid internal state when making predictions.

  • LSTM often gives better results.

Ensemble Model

Different types of models can be used.

  • An ensemble model makes predictions with multiple models, especially different types of models, and outputs the average of all predictions.

  • If the model quality is not good enough, an ensemble of multiple models may produce a better prediction when combined.

  • Robustness

  • Multiple models must be trained.

Anomaly Detection

Principal Component Analysis (PCA)

  • The "normal" behavior is learned from the training data, and different behavior is shown as an anomaly.

  • The input data is reduced with PCA and then an attempt is made to reconstruct the data. The output of the model shows the reconstruction error as a value from 0 to 1.

  • Detects anomalies in the data.

  • Trains fast.

  • Non-linear dependencies are not well reconstructed.

Autoencoder

(RNN dependent)

  • The "normal" behavior is learned from the training data, and different behavior is shown as an anomaly.

  • The input data is reduced with an autoencoder and then an attempt is made to reconstruct the data. The output of the model shows the reconstruction error as a value from 0 to 1.

See, Wikipedia: Autoencoder.

  • Detects anomalies in the data, taking into account the complex behavior of the system.

  • Non-linear behavior can be reconstructed.

  • Autoencoding with RNN requires complex model training.

* For ASCMO-STATIC: The model is only available if you have enabled the advanced settings via File > Options > Advanced Settings (see also Enabling Advanced Settings).

See also  

ASCMO-STATIC:

Model Types of ASCMO-STATIC

ASCMO-DYNAMIC:

Model Configurations (ASCMO-DYNAMIC)

Modeling with NARX

Modeling with RNN

Model Prediction with Convolutional Neural Network (CNN)

Modeling with Anomaly Detection