Stages of Operation
This section describes a typical lifecycle of a trainable unit.
Creation and Configuration
First, a trainable unit needs to be created. To do this, the Create action of the Machine Learning context is used.
After that, the trainable unit needs to be configured. The properties of a trainable unit are described in the corresponding section. When configuring the trainable unit properties, make sure to choose the task and the algorithm that fit your business problem. Also, make sure to provide a Label Field Column Name that will indicate the label column of your dataset (the column with the target variable). Note that all datasets that are passed to the trainable unit functions must have a field with such name, otherwise an error will be displayed. Note that a trainable unit cannot be created if the Name and the Label Column Field Name properties are left empty.
If data instances of the dataset are weighted, the corresponding setting must be set to true and a Weight Field Name must be provided. Note that this name cannot be the same as the Label Field Column Name. If it is, an error will occur. The weights field must be of a numeric type.
Training
When a trainable unit is created and configured, its task and algorithm are specified, the algorithm hyperparameters are set, internal filters are added (if needed), the trainable unit can be trained on a training set. The Train function serves this purpose.
The training set must be passed to the function as a data table, the format of which must satisfy the requirements given in the corresponding section.
As soon as the training process of a trainable unit is complete its icon changes from (indicates an untrained trainable unit) to
(indicates a trained trainable unit). In addition, the state of the trainable unit gets stored in the configuration storage. If the Iotellect Server needs to be restarted, each trainable unit restores its state after the reboot.
The effect of calling the Train function on an already trained trainable unit depends on whether this trainable unit is based on an updateable algorithm. If it is, the effect will be accumulative, meaning that the the trainable unit will be updated rather than retrained. If a trainable unit does not use an updateable algorithm it will be retrained.
If the state of a trainable unit needs to be reset the Reset function can be used. The function also has a wrapper action with the same name.
![]() | If any option of a trainable unit is changed its state is reset, and therefore it will have to be trained again. |
Evaluation
When the trainable unit is trained, it can be used to make predictions on new data. However, it is good practice to evaluate the performance of a machine learning model before starting to use it for predictions. The Evaluate function evaluates the performance of a trained unit against a test set and returns a set of evaluation metrics. The test set must be passed to the function as a data table, the format of which must satisfy the requirements given in the corresponding section.
The evaluation statistics are stored in the configuration storage. So if the evaluate function is called more than once the newly acquired evaluation metrics contribute to the overall statistics. If the evaluation statistics need to be reset the Reset Evaluation function can be used. The function also has a wrapper action with the same name.
If the Evaluate function is called on an untrained trainable unit, an error message will be displayed.
Cross-Validation
Another option to evaluate a trainable unit is to use the Cross Validate function. The function performs k-fold cross-validation on the given dataset and returns the same set of evaluation metrics as the Evaluate function. The values of the evaluation metrics are averaged over the k folds. The number of folds k and the seed for the random number generator are specified in the corresponding configuration options. The number of folds must not exceed the number of records in the test set. The requirements to the input dataset are the same as for the Evaluate function.
Note that the Cross Validate function can be used even on an untrained trainable unit. The function does not change the state of a trainable unit (if it was untrained it remains untrained; if it was trained, no retraining is performed).
You might need to use cross-validation if you do not want to split your dataset into a training set and a test set (for example, if the dataset is too small to spare any data instances). If, based on the evaluation metrics returned by the Cross Validate function, the model proves to perform as desired the whole dataset can be used as a training set to train the model.
Using the Trainable Unit to Make Predictions
Now it is time to use the trained and evaluated trainable unit for predictions. The Operate function serves this purpose. It is important that the format of the Data Table that is being passed as the argument of the Operate function matches the format of the data table that was used to train the trainable unit. The Operate function can be used either on unlabeled data or on data with labels. When used on unlabeled data, the label column must be filled with NULL values. The latter usage (on data with labels) may serve as another tool to evaluate the performance of the machine learning model. The predicted and the actual values (and the error for regression problems) will be returned in the resulting data table together with the corresponding features (independent variables).
If the Operate function is called on an untrained trainable unit, an error message will be displayed.
Was this page helpful?