Unveiling the Magic of Support Vector Machines: A Beginner’s Guide to SVMs in Python
Support Vector Machines (SVMs) are a powerful supervised learning algorithm used for binary classification. This algorithm works by finding a hyperplane that best divides the datasets into two classes. In this tutorial, we'll create an SVM from scratch in Python, employing both linear and non-linear kernels.
Understanding the Basics
1. Cost Function:
The cost function in SVM aims to minimize the error while maximizing the margin between two classes. It's usually expressed as:
where ||w|| is the norm of the weight vector, C is the penalty parameter, and ξ are the slack variables representing the misclassification error.
2. Kernel Function:
Kernels help in transforming the data to a higher dimension where it becomes linearly separable. Common kernels include:
Linear Kernel: K(x, y) = x^T y
Polynomial Kernel: K(x, y) = (1 + x^T y)^d
Radial Basis Function (RBF) Kernel: K(x, y) = exp(-\gamma ||x - y||^2)
3. Propagation:
The SVM algorithm employs a set of mathematical optimizations, like the Sequential Minimal Optimization (SMO) algorithm, to compute the optimal values for \( w \) and \( b \) that define the separating hyperplane.
Code Implementation
import numpy as np
from cvxopt import matrix as cvxopt_matrix
from cvxopt import solvers as cvxopt_solvers
# Generating Linearly separable data
np.random.seed(0)
X = np.r_[np.random.randn(20, 2) - [2, 2], np.random.randn(20, 2) + [2, 2]]
Y = [-1] * 20 + [1] * 20
def linear_kernel(x, y):
return np.dot(x, y)
def polynomial_kernel(x, y, p=3):
return (1 + np.dot(x, y)) ** p
def rbf_kernel(x, y, gamma=0.1):
return np.exp(-gamma * np.linalg.norm(x-y)**2)
def svm_train(X, Y, kernel, C=1.0):
n, d = X.shape
K = np.zeros((n, n))
for i in range(n):
for j in range(n):
K[i,j] = kernel(X[i], X[j])
P = cvxopt_matrix(np.outer(Y, Y) * K)
q = cvxopt_matrix(-1 * np.ones(n))
G = cvxopt_matrix(np.vstack((np.eye(n)*-1,np.eye(n))))
h = cvxopt_matrix(np.hstack((np.zeros(n), np.ones(n) * C)))
A = cvxopt_matrix(Y, (1,n), 'd')
b = cvxopt_matrix(0.0)
solution = cvxopt_solvers.qp(P, q, G, h, A, b)
a = np.ravel(solution['x'])
sv = a > 1e-5
ind = np.arange(len(a))[sv]
a = a[sv]
sv_x = X[sv]
sv_y = Y[sv]
b = 0
for n in range(len(a)):
b += sv_y[n]
b -= np.sum(a * sv_y * K[ind[n],sv])
b /= len(a)
return a, sv_x, sv_y, b
# Training SVM with Linear Kernel
a, sv_x, sv_y, b = svm_train(X, Y, linear_kernel)
# To use other kernels, just replace 'linear_kernel' with 'polynomial_kernel' or 'rbf_kernel' in the svm_train function call.
In this code snippet:
We generated some linearly separable data.
Defined kernel functions: linear, polynomial, and RBF kernels.
Implemented the svm_train
function to train our SVM model using the CVXOPT library for solving the quadratic programming problem.
Now, you have a basic understanding and a working code for SVM in Python. You can now experiment with different kernels and see how they impact the decision boundary. Happy learning!