Overview: Model Type Descriptions

Modeling Method	Model/Cell Type	Description	Use for	Pros	Cons
Static Modeling (systems without inner states, time-independent or steady-state)	ASC GP (ASCMO Gaussian Process)	Gaussian Process, a probabilistic regression model. One basis function per training data point at the location of the training data point. If the number of training data points is greater than max. no. of basis functions, a subset of points is selected, either randomly or using the GP SCS method. In this case, the Subset of Regressors algorithm is used. The default squared exponential kernel is good for many technical systems because the effect of an input on the output is damped with a quadratic distance. For systems with higher fidelity, the maternal kernel can be selected.	This default model should be used when the number of training data and the number of inputs are small to medium. For 15 inputs, do not use more than 4000 training data points (e.g., use MLP instead). The number of inputs should not exceed 25-30.	Good model quality. Good generalization quality with small number of training data. Shows model uncertainty (sigma). Probabilistic training avoids overfitting. Model quality can be accessed with Leave-One-Out error, so all data can be used as training data without the need for validation data.	High memory consumption during training (scales quadratically with the number of training data points). Computational load for model prediction (inference) depends on the number of training data points, which can be large.
	ASC GP SCS* (ASCMO Sparse Constant Sigma Gaussian Process)	A Sparse Gaussian Process. The number of basis functions can be smaller than the number of training data. Can optimize the positions of the basis functions.	Use when a probabilistic model is wanted, but a larger number of training data points are available.	The number of basis functions can be chosen and is generally small. Makes model prediction (inference) faster than full GP.	Model uncertainty (sigma) not available.
	Polynom*	Polynomial regression with an nth degree polynomial function. y = a + b * x1 + c * x2 + d * x1 * x2 + e * x12... Interactions (e.g., first order x1*x2) and order (e.g., third order x13) can be selected. Stepwise regression to automatically find relevant terms is the default.	Linear systems with few linear interactions between inputs.	Extrapolation can be good. Fast and simple.	Poor model quality for nonlinear systems with interacting inputs.
	MLP (Multilayer Perceptron)	A multilayer neural network. Activation functions can be selected.	Many data points (>10,000). Many inputs (>20).	Model size can be chosen (number of neurons/layers), so it can be small. Can be trained on a huge number of data.	Risk of overfitting, needs validation data to avoid overfitting.
	ASC Compressed* (additional license required)	A model that depends on a squared exponential kernel. The number of basis functions can be selected and the position of these basis functions is trained.	Use when the model needs to be small (only 20 basis functions). Complex behavior can be predicted. Model prediction (inference) is the same as ASC GP, so this model type can be integrated into the Bosch AMU (Advanced Modeling Unit for hardware-supported fast prediction on Bosch ECUs).	Small in comparison to a full Gaussian process.	Risk of overfitting, needs validation data to avoid overfitting.
	Classification (GP)*	A binary classification model. The output channel must be zeros and ones. The model uses a Gaussian process for the classification task.	Binary (two-class) classification. Use when there are less than 4000 training data points with 15 inputs. The number of inputs should not be greater than 20-30 inputs.	Good generalization with a small number of training data points.	Gaussian process prediction (inference) has a higher computational load than other methods.
	Classification (MLP)*	A binary classification model. The output channel must be zeros and ones. The model uses MLP (Multi Layer Perceptron)for the classification task.	Binary (two-class) classification. Works with a large number of training data points.	Works with a large number of data, model complexity (memory) can be selected.	Risk of overfitting, needs validation data to avoid overfitting.
	Classification (Random Decision Trees)	A binary classification model. The output channel must be zeros and ones. The model uses decision trees with feature and data bagging for classification. See, Wikipedia: Decision Tree and Random Forest	Binary (two-class) classification. Works with a large number of training data points.	With a robust number of trees, the risk of overfitting is low. If overfitting occurs, the depth of trees can be reduced. Prediction and exported code for prediction is simple. Can be used for a large number of inputs.	Model output is not steady (bad for gradient optimization).
Transient Modeling with NARX structure (The NARX structure allows the use of a static model at the core.)	see Static Modeling
Transient Modeling with Recurrent Neural Network (RNN)	LSTM Cell (Long short-term memory networks)	A neural network, deep learning model, with multiple layers. Activation functions can be selected. Multiple gates allow adding and removing information ("forget gate"). The most complex RNN in ASCMO, with five gates per layer, activation and recurrent activation functions. See, Wikipedia: Long short-term memory.	Use when learning long-term behavior.	Good model quality.	Higher complexity than other RNN types.
	LSTM Cell without Peepholes	Same as the LSTM cell, but with fewer connections (the gate layers cannot see the cell state). This was the first iteration of the LSTM idea.	Use when learning long-term behavior.		Might be worse than a full LSTM cell.
	GRU Cell (Gated Recurrent Unit)	Medium complex RNN cell with 3 gates. See, Wikipedia: Gated recurrent unit.	Use when learning long-term behavior.	Medium complexity, but can reach LSTM model quality.
	GRU Cell (Reset After)	Reversed order of reset and update gate compared to standard GRU.	Use when learning long-term behavior.
	RNN Cell	Most basic RNN cell with only one gate (only one activation function).	Use when only short-term effects are important and system complexity is low.	Low complexity.	Short-term effects only.
Convolutional Neural Network (CNN)	1D Convolution	Deep neural network with convolution. See, Wikipedia: Convolutional neural network.	Alternative to RNN (LSTM cell).	Noise resistant. In addition to features from the time dimension, it can also learn deep (spatial) features of the data. Recovers from an invalid internal state when making predictions.	LSTM often gives better results.
Ensemble Model	Different types of models can be used.	An ensemble model makes predictions with multiple models, especially different types of models, and outputs the average of all predictions.	If the model quality is not good enough, an ensemble of multiple models may produce a better prediction when combined.	Robustness	Multiple models must be trained.
Anomaly Detection	Principal Component Analysis (PCA)	The "normal" behavior is learned from the training data, and different behavior is shown as an anomaly. The input data is reduced with PCA and then an attempt is made to reconstruct the data. The output of the model shows the reconstruction error as a value from 0 to 1.	Detects anomalies in the data.	Trains fast.	Non-linear dependencies are not well reconstructed.
Anomaly Detection	Autoencoder (RNN dependent)	The "normal" behavior is learned from the training data, and different behavior is shown as an anomaly. The input data is reduced with an autoencoder and then an attempt is made to reconstruct the data. The output of the model shows the reconstruction error as a value from 0 to 1. See, Wikipedia: Autoencoder.	Detects anomalies in the data, taking into account the complex behavior of the system.	Non-linear behavior can be reconstructed.	Autoencoding with RNN requires complex model training.

* For ASCMO-STATIC: The model is only available if you have enabled the advanced settings via File > Options > Advanced Settings (see also Enabling Advanced Settings).