# SVM 101

Posted by c cm on November 4, 2014

## Maximal Margin Classifier

hyperplane: $\beta_0 + \beta_1X_1 + ... + \beta_pX_p = 0$
maximal margin hyperplane (also known as the optimal separating hyperplane), which is the separating hyperplane that is farthest from the training observations.

Calculate the (perpendicular) distance from each training observation to a given separating hyperplane; margin is the smallest such distance is the minimal distance from the observations to the hyperplane.

$\underset{\beta_0, \beta}{max} M \\ s.t. \|\beta\| = 1 \\ y_i(\beta_0 + x_i^T\beta) \ge M,\ i = 1, ..., n$
or
$\underset{\beta_0, \beta}{min} \|\beta\| \\ s.t. y_i(\beta_0 + x_i^T\beta) \ge 1,\ i = 1, ..., n$

## Support Vector Classifier

$\underset{\beta_0, ..., \beta_p, \epsilon_1, ..., \epsilon_n}{max} M \\ s.t. \sum_{j=1}^p \beta_j^2 = 1 \\ y_i(\beta_0 + x_i^T\beta) \ge M(1-\epsilon_i) \ \\ \epsilon \ge 0, \sum_{i=1}^n \epsilon_i \le C$
or
$min \|\beta\| \\ s.t. y_i(\beta_0 + x_i^T\beta) \ge 1 - \epsilon_i,\ i = 1, ..., n\\ \epsilon \ge 0, \sum_{i=1}^n \epsilon_i \le C$
or
$min \frac{1}{2}\|\beta\|^2 + C\sum_{i=1}^n\epsilon_i \\ s.t. y_i(\beta_0 + x_i^T\beta) \ge 1 - \epsilon_i,\ i = 1, ..., n\\ \epsilon \ge 0$
Lagrange dual objective function
$L_P = \frac{1}{2}\|\beta\|^2 + C\sum_{i=1}^n\epsilon_i - \sum_{i=1}^n \alpha_i[y_i(\beta_0 + x_i^T\beta) - (1 - \epsilon_i)] - \sum_{i=1}^n \mu_i\epsilon_i \\ \beta = \sum_{i=1}^n \alpha_i y_i x_i \\ 0 = \sum_{i=1}^n \alpha_i y_i \\ \alpha_i = C - \mu_i \\ L_D = \sum_{i=1}^n\alpha_i - \frac{1}{2}\sum_{i=1}^n\sum_{i'=1}^n \alpha_i \alpha_{i'} y_i y_{i'} x_i^T x_{i'} \\$
the slack variable εi tells us where the ith observation is located, relative to the hyperplane and relative to the margin.

C as a budget for the amount that the margin can be violated by the n observations.

## Support Vector Machine

solution
$% + \beta_0 \\ K(x, x') = < h(x), h(x_i)> %]]>$

ref
Elements of Statistical Learning 