Over Fitting in ML – occurs often because neural networks tend to be high-variance.
Some techniques can minimize this variance by:
1. Regularization:
Regularization adds a penalty on the different parameters of the model to reduce the freedom of the model. Hence, the model will be less likely to fit the noise of the training data.
- The L1 regularization adds a penalty equal to the sum of the absolute value of the coefficients.
- The L2 regularization adds a penalty equal to the sum of the squared value of the coefficients.
- Elastic-net regularization is the combination of L1 and L2 regularizations.
2. Cross-Validation:
Split the training data to two phases – training and validation.
On the training phase train different models and use the validation phase only to check performance of each model and then select the model with the best validation.
3. Drop Out also a way of Ensemble learning(combination of models):
Turn off a fraction of neurons in each training step, thus forcing other neuron to learn new features because the model is simpler, thus enabling an Ensemble learning technique because every step builds a completely different neural network configuration because in every epoch a different set of neurons will be turned off.