SVM – Kernel Trick

* Support vector machines analysis are supervised learning models that analyze data, and are used for classification analysis and regression analysis.

* The minimization function is the error of how much misclassified points are there in the model(graph), (either inside the margins or of course outside of them), and also besides the error the minimization function also has minimize the narrowness of margin between both sides. (the widest margin is the best).

* Given a set of training examples, each marked as belonging to one or the other of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other(without equal chances of being selected and without being able to make a probability distribution), making it a non-probabilistic binary linear classifier.

* An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall.

When a dividing line is not good enough – non linear solution:

* In addition to performing linear classification, SVMs can efficiently perform a non-linear classification using what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature spaces, such as circles/parabolas/cones/hyperbolas and then projecting them back to a lower dimension and get the right separation.