Algorithm Hyperparameters

Hyperparameters are parameters of a machine learning algorithm that are set prior to the commencement of the training process.

Parameters common to all algorithms:

Field Description	Field Name
Batch Size. The preferred number of instances to process if batch prediction is being performed. More or fewer instances may be provided, but this gives implementations a chance to specify a preferred batch size.	batchSize
Number of Decimal Places. The number of decimal places to be used for the output of numbers shown in the trained unit info that is returned by the Train function.	numDecimalPlaces

Parameters specific to an algorithm:

Linear Regression

Field Description	Field Name
Attribute Selection Method. Sets the method used to select attributes. Available methods are: no attribute selection, attribute selection using M5's method (step through the attributes removing the one with the smallest standardized coefficient until no improvement is observed in the estimate of the error given by the Akaike information criterion), and a greedy selection using the Akaike information metric.	attributeSelectionMethod
Eliminate Collinear Attributes. Sets whether or not collinear attributes are eliminated.	eliminateCollinearAttributes
Minimal. If enabled, means and standard deviations get discarded to conserve memory. As a consequence, the trained unit info that is returned by the Train function is truncated.	minimal
Ridge Parameter. The value of the ridge parameter for the L2 regularization.	ridge
Output Additional Statistics. Determines whether to output additional statistics (such as standard deviation of coefficients and t-statistics) in the trained unit info for regression analysis.	outputAdditionalStats

Logistic Regression

Field Description	Field Name
Maximum Number of Iterations. Maximum number of iterations to perform.	maxIts
Ridge Parameter. The value of the ridge parameter.	ridge
Use Conjugate Gradient Descent. Use conjugate gradient descent rather than BFGS updates; faster for problems with many parameters.	useConjugateGradientDescent

Multilayer Perceptron

Field Description	Field Name
Decay. Setting this option to true will cause the learning rate to decrease. This will divide the starting learning rate by the epoch number, to determine what the current learning rate should be. This may help to stop the network from diverging from the target output, as well as improve general performance.	decay
Hidden Layers. This option defines the hidden layers of the neural network. This is a list of positive whole numbers. 1 for each hidden layer. Comma separated. To have no hidden layers put a single 0 here. There are also wildcard values 'a' = (attributes + classes) / 2, 'i' = attributes, 'o' = classes, 't' = attributes + classes.	hiddenLayers
Learning Rate. The amount the weights are updated.	learningRate
Momentum. Momentum applied to the weights during updating.	momentum
Nominal to Binary Filter. If enabled, the filter will be used to preprocess the instances. This could help improve performance if there are nominal attributes in the data.	nominalToBinaryFilter
Normalize Attributes. Determines whether to normalize the attributes. This could help improve performance of the network. Nominal attributes will be normalized as well (after they have been run through the nominal to binary filter if that is in use) so that the nominal values are between -1 and 1.	normalizeAttributes
Normalize Label Values. Determines whether to normalize the label column values if they are numeric. This could help improve performance of the network. The values are normalized to be between -1 and 1. Note that this is only internally, the output will be scaled back to the original range.	normalizeLabelValues
Reset. Setting this to true will allow the network to reset with a lower learning rate. If the network diverges from the answer this will automatically reset the network with a lower learning rate and begin training again. Note that if the network diverges but isn't allowed to reset it will fail the training process and return an error message.	reset
Random Number Seed. Seed used to initialize the random number generator. Random numbers are used for setting the initial weights of the connections between nodes, and also for shuffling the training data.	seed
Training Time. The number of epochs to train through. If the validation set is non-zero then it can terminate the network early.	trainingTime
Validation Set Size. The percentage size of the validation set. The training will continue until it is observed that the error on the validation set has been consistently getting worse, or if the training time is reached. If this is set to zero no validation set will be used and instead the network will train for the specified number of epochs.	validationSetSize
Validation Threshold. Used to terminate validation testing. The value here dictates how many times in a row the validation set error can get worse before training is terminated.	validationThreshold

Naive Bayes Classifier

Field Description	Field Name
Use Kernel Estimator. Determines whether to use a kernel estimator for numeric attributes rather than a normal distribution.	useKernelEstimator
Use Supervised Discretization. Determines whether to use supervised discretization to convert numeric attributes to nominal ones.	useSupervisedDiscretization

One-Class Support Vector Machine

Field Description	Field Name
Do Not Replace Missing Values. Determines whether to turn off automatic replacement of missing values. WARNING: set to true only if the data does not contain missing values.	doNotReplaceMissingValues
Kernel. The kernel to use.	svmKernel
Kernel Parameters. Parameters of the chosen kernel.	svmKernelParameters
Normalize. Determines whether to normalize the data.	normalize
Nu. The value of nu.	nu
Random Number Seed. Seed used to initialize the random number generator.	seed
Shrinking. Determines whether to use the shrinking heuristic.	shrinking
Tolerance Parameter. The tolerance of the termination criterion.	toleranceParameter

Random Forest

Field Description	Field Name
Bag Size Percentage. Size of each bag, as a percentage of the training set size.	bagSizePercent
Break Ties Randomly. Break ties randomly when several attributes look equally good.	breakTiesRandomly
Calculate Out-of-bag Error. Determines whether the out-of-bag error is calculated.	calcOutOfBag
Compute Attribute Importance. Compute attribute importance via mean impurity decrease.	computeAttributeImportance
Maximum Depth of the Tree. The maximum depth of the tree, 0 for unlimited.	maxDepth
Number of Execution Slots. The number of execution slots (threads) to use for constructing the ensemble.	numExecutionSlots
Number of Features. Sets the number of randomly chosen attributes. If 0, int(log_2(num_predictors) + 1) is used.	numFeatures
Number of Iterations. The number of iterations to be performed.	numIterations
Output Out-of-bag Complexity Statistics. Determines whether to output complexity-based statistics in the trained unit info when out-of-bag evaluation is performed.	outputOutOfBagComplexityStats
Output Classifiers. Determines whether to output the individual classifiers in the trained unit info.	outputClassifiers
Random Number Seed. Seed used to initialize the random number generator.	seed

Reduced Error Pruning (REP) Decision Tree

Field Description	Field Name
Initial Count. Initial class value count.	initialCount
Maximum Depth of the Tree. The maximum tree depth (-1 for no restriction).	maxDepth
Minimum Number of Instances. The minimum total weight of the instances in a leaf.	minNum
Minimum Proportion of the Variance. The minimum proportion of the variance on all the data that needs to be present at a node in order for splitting to be performed (used only for regression problems).	minVarianceProp
No Pruning. Determines whether pruning is performed.	noPruning
Number of Folds. Determines the amount of data used for pruning. One fold is used for pruning, the rest for growing the rules.	numFolds
Random Number Seed. The seed used for random data shuffling.	seed
Spread Initial Count. Spread initial count across all values instead of using the count per value.	spreadInitialCount

Support Vector Machine

Field Description	Field Name
C. The complexity parameter C.	c
Filter Type. Determines how/if the data will be transformed.	filterType
Kernel. The kernel to use.	kernel
Kernel Parameters. Parameters of the chosen kernel (kernel-specific).	kernelParameters
Epsilon. The epsilon for round-off error.	epsilon
Tolerance Parameter. The tolerance parameter.	toleranceParameter
Build Calibration Models. Determines whether to fit calibration models to the SVM's outputs (for proper probability estimates).	buildClibrationModels
Calibrator. The calibration method to use. Visible only if buildClibrationModels is set to true.	calibrator
Calibrator Parameters. Parameters of the calibrator. Visible only if buildClibrationModels is set to true.	calibratorParameters
Number of Folds. The number of folds for cross-validation used to generate training data for calibration models (-1 means use training data). Visible only if buildClibrationModels is set to true.	calibNumFolds
Random Number Seed. Random number seed for the cross-validation used to generate training data for calibration models. Visible only if buildClibrationModels is set to true.	calibRandomSeed

Support Vector Regression

Field Description	Field Name
C. The complexity parameter C.	c
Filter Type. Determines how/if the data will be transformed.	filterType
Kernel. The kernel to use.	kernel
Kernel Parameters. Parameters of the chosen kernel (kernel-specific)	kernelParameters
Optimizer. The learning algorithm.	regOptimizer
Optimizer Parameters. Parameters of the Optimizer.	regOptimizerParameters

Naive Bayes Classifier

Field Description	Field Name
Use Kernel Estimator. Determines whether to use a kernel estimator for numeric attributes rather than a normal distribution.	useKernelEstimator
Use Supervised Discretization. Determines whether to use supervised discretization to convert numeric attributes to nominal ones.	useSupervisedDiscretization

Stochastic Gradient Descent

Field Description	Field Name
Do Not Normalize. Determines whether normalization is turned off.	doNotNormalize
Do Not Replace Missing Values. Determines whether to turn off global replacement of missing values.	doNotReplaceMissingValues
Number of Epochs. The number of epochs to perform (batch learning). The total number of iterations is the number of epochs multiplied by the number of instances.	epochs
Lambda. The regularization constant.	lambda
Learning Rate. Determines the learning rate. If normalization is turned off, then the default learning rate will need to be reduced (e.g. set to 0.0001).	learningRate
Loss Function. The loss function to use.	lossFunction
Epsilon. The epsilon threshold for epsilon insensitive and Huber loss. An error with absolute value less that this threshold has loss of 0 for epsilon insensitive loss. For Huber loss this is the boundary between the quadratic and linear parts of the loss function.	epsilon
Random Number Seed. Seed used to initialize the random number generator.	seed

Filtered Predictor

Field Description	Field Name
Algorithm. Base algorithm to be used.	algorithm
Base Algorithm Hyperparameters. Determines the parameters of selected algorithm.	baseAlgorithmParameters
Filter. Filter to be used.	filter
Filter Parameters. Determines the parameters of selected filter.	filterParameters
Random Number Seed. Seed used to initialize the random number generator.	seed

Hoeffding Tree

Field Description	Field Name
Grace Period. Number of instances (or total weight of instances) a leaf should observe between split attempts.	gracePeriod
Hoeffding Tie Threshold. Theshold below which a split will be forced to break ties.	hoeffdingTieThreshold
Leaf Prediction Strategy. The leaf prediction strategy to use.	leafPredictionStrategy
Naive Bayes Prediction Threshold. The number of instances (weight) a leaf should observe before allowing naive Bayes (adaptive) to make predictions.	naiveBayesPredictionThreshold
Print Leaf Models. Determines whether to output the leaf models in the trained unit info (naive Bayes leaves only).	outputLeafModels
Split Confidence. The allowable error in a split decision. Values closer to zero will take longer to decide.	splitConfidence
Splitting Criterion. The splitting criterion to use.	splitCriterion
Minimum Fraction Of Weight by Information Gain. Minimum fraction of weight required down at least two branches for information gain splitting.	minimumFractionOfWeightInfoGain

Multiclass Updateable Classifier

Field Description	Field Name
Base Algorithm. Base algorithm to be used.	baseAlgorithm
Base Algorithm Hyperparameters. Determines the parameters of selected algorithm.	baseAlgorithmParameters
Method. Sets the method to use for transforming the multi-class problem into several 2-class ones.	method
Log Loss Decoding. Determines whether to use log loss decoding for random or exhaustive codes.	logLossDecoding
Width Factor. Sets the width multiplier when using random codes. The number of codes generated will be this number multiplied by the number of classes.	randomWidthFactor
Use Pairwise Coupling. Determines whether to use pairwise coupling.	usePairwiseCoupling
Random Number Seed. Seed used to initialize the random number generator.	seed

Kernel Parameters

Parameters common to all kernels:

Field Description	Field Name
Cache Size. The size of the cache (a prime number), 0 for full cache and -1 to turn it off.	kernelCacheSize

Parameters specific to a kernel:

Pearson VII Function Kernel

Field Description	Field Name
Omega. The omega value.	kernelOmega
Sigma. The sigma value.	kernelSigma

Polynomial and Normalized Polynomial Kernels

Field Description	Field Name
Degree. The exponent value.	kernelExponent
Use Lower Order. Determines whether to use lower-order terms.	kernelUseLowerOrder

Radial Basis Function (RBF) Kernel

Field Description	Field Name
Gamma. The gamma value.	kernelGamma

Kernel Parameters Used By the One-Class Support Vector Machine Algorithm

Parameters common to all kernels:

Field Description	Field Name
Cache Size. The cache size in Mb.	kernelSvmCacheSize

Polynomial Kernel

Field Description	Field Name
Coef0. Independent term in kernel function.	kernelSvmCoefficient0
Degree. The exponent value.	kernelSvmDegree
Gamma. The gamma to use, if 0 then 1/max_index is used.	kernelSvmGamma

Radial Basis Function (RBF) Kernel

Field Description	Field Name
Gamma. The gamma to use, if 0 then 1/max_index is used.	kernelSvmGamma

Sigmoid Kernel

Field Description	Field Name
Coefficient0. Independent term in kernel function.	kernelSvmCoefficient0
Gamma. The gamma to use, if 0 then 1/max_index is used.	kernelSvmGamma

Optimizer Parameters

Parameters common to all optimizers:

Field Description	Field Name
Epsilon. The epsilon for round-off error.	epsilon
Epsilon Parameter. The epsilon parameter of the epsilon insensitive loss function.	epsilonParameter
Random Number Seed. Seed used to initialize the random number generator.	seed

Parameters of the RegSMO Improved

Field Description

Field Name

Tolerance. Tolerance parameter used for checking stopping criterion (b_up is less then b_low + 2*tol).

tolerance

Use Variant 1. Set true to use variant 1 of the paper given below, otherwise use variant 2.

S.K. Shevade, S.S. Keerthi, C. Bhattacharyya, K.R.K. Murthy: Improvements to the SMO Algorithm for SVM Regression. In: IEEE Transactions on Neural Networks, 1999

useVariant1

Clustering Parameters

Parameters for clustering algorithms.

Simple K Means

Field Description	Field Name
Random Number Seed. The initial seed value for the random number generator used in the algorithm	seed
Number of clusters. The number of clusters to be generated by the algorithm	numClusters
Number of Execution Slots. The number of parallel executions that can be performed by the algorithm	numExecutionSlots
Maximum number of iterations. The maximum number of iterations the algorithm can perform	maxIterations
Faster distance calculations. A flag to indicate if faster distance calculation methods should be used	fasterDistanceCalc
Do Not Replace Missing Values. A flag to indicate if missing values in the data should not be replaced	dontReplaceMissingValues
Display standard deviations. A flag to indicate if the standard deviations should be displayed	displayStdDevs
Canopy T1 distance. The distance metric used in the first phase of the canopy clustering	canopyT1
Canopy T2 distance. The distance metric used in the second phase of the canopy clustering	canopyT2
Canopy periodic pruning rate. The rate at which the canopy tree is pruned in each periodic pruning cycle	canopyPeriodicPruningRate
Minimum canopy density. The minimum density of the canopy tree	canopyMinimumCanopyDensity
Max number of canopies to hold in memory. The maximum number of canopies that can be held in memory at a given time	canopyMaxNumCanopiesToHoldInMemory

Hierarchical Clustering

Field Description	Field Name
Number of clusters. The number of clusters to be generated by the algorithm.	numClusters
Distance is branch length. A flag to indicate if the distance between clusters should be represented as the length of the branch joining them.	distanceIsBranchLength
Print hierarchy in Newick format. A flag to indicate if the hierarchy should be printed in Newick format.	printNewick

Density Based Clustering

Field Description	Field Name
Number of clusters. The number of clusters to be generated by the algorithm.	numClusters
Minimum standard deviation. The minimum standard deviation of the clusters to be generated by the algorithm.	minStdDev

Filtered Predictor

Field Description	Field Name
Base algorithm. Base algorithm to use for filtering the data	baseAlgorithm
Base algorithm hyperparameters. Hyperparameters of the base algorithm.	baseAlgorithmHyperparameters
Filter. The type of filter to be applied to the data, replace missing values or remove missing values	filterType
Filter Parameters. In the case that replace missing values is selected for Filter, select whether to ignore the label field. if `True`, the label field will be temporarily unset before the filter is applied.	filterParameters
Random Number Seed. The initial seed value for the random number generator used in the algorithm.	randomNumberSeed

DPF_EXPORT_V5.8.8 / TIME_1743710555 / 2025-04-03 20:02:35 / STRLEN_33942 / STRLEN_HTML_41799

{VERSION}

Was this page helpful?