20/10/2015

#Support Vector Machine (SVM) Last Lecture: Maximum margin problems

Introduction of soft margin
When there is mislabelled data; a hyperplane is introduced to cleanly split data and maximise margin distance
Problem based on linear classification
Margin is straightforward to calculate in linear case but not when the problem's non-linear:

![Non-linear margin] (http://www.blaenkdenum.com/images/notes/machine-learning/support-vector-machines/x-space-non-linear-svm.png)

##Kernel Trick

Map observations to a higher dimensional space using a Kernel Function:

e.g.

φ: ℝ^10⁶⟼ ℝ^10¹⁰⁰

φ(x) - Intractable (heard to calculate by itself)
Analogy - GPU input vector which you cannot alter once it is being processed

##Kernelize the Algorithm

Instead of operating in input space - change x's to φ(xⁱ)'s to move to feature space

ω: primal/single representation of the vector
α: dual representation of the vector
ξ_i: slack parameter
ω,θ,ξ - penetration variables (penetrate margins)
y⁽ⁱ⁾ = ±1 for either "yes"/"no" class of data points

Minimise_(ω,θ,ξ)  
              ¹/₂‖ω‖² + cΣ_i(ξ_i)
Subject to:  
              y⁽ⁱ⁾(ω·x⁽ⁱ⁾ - θ) ≥ 1 - ξ_i
And:  
              ξ_i ≥ 0

Math Breakdown:

 ω= Σ_i(α_iy⁽ⁱ⁾φ(x⁽ⁱ⁾)
  
‖ω‖² = ω·ω = (Σ_i(α_iy⁽ⁱ⁾φ(x⁽ⁱ⁾))·(Σ_j(α_jy^(j)φ(x^(j)))
    
          = Σ_i,j(α_iα_jy⁽ⁱ⁾y^(j)(φ(x⁽ⁱ⁾)φ(x^(j))))
                
          = Σ_i,j(α_iα_jy⁽ⁱ⁾y^(j)k(x⁽ⁱ⁾,x^(j)))
                
          = α^T(y⁽ⁱ⁾y^(j)k_i,j)_i,j

Substitute ω·φ(x⁽ⁱ⁾)

           = Σ_j(α_jy^(j)φ(x^(j))φ(x⁽ⁱ⁾))
          
           = Σ_j(α_jy^(j)k(x⁽ⁱ⁾,x^(j)))

###Kernelized Algorithm

Minimise_(α,θ,ξ)  
              ¹/₂α^T(y⁽ⁱ⁾y^(j)k_i,j)_i,j + cΣ_i(ξ_i)
Subject to:  
              y⁽ⁱ⁾(Σ_j(α_jy^(j)k(x⁽ⁱ⁾,x^(j))) - θ) ≥ 1 - ξ_i
And:  
              ξ_i ≥ 0

Dual representation (α) used for quadratic programming problems as opposed to primal representation (ω)

##Kernel Function ####Mercer's Theorem Pre-condition:
If k is symmetric:

k(u, v) = k(v, u)

,non-negative definite:
![non-negative definite kernel] (https://upload.wikimedia.org/math/7/9/e/79e0f0a14643312d46347a004e688ef7.png)
for all finite sequences of points x_1,..., x_n of [a, b] and all choices of real numbers c_1,..., c_n
Post-condition:

⇒ ∃ φ s.t. k(u,v)=φ(u)·(v)

Examples:
Identity Kernel:

 k(u,v)=u·v

takes O(n) work in n-space

 k(u,v) = (u·v)²

          = (Σ_k(u_kv_k))²
          
          = (Σ_k(u_kv_k))(Σ_k^'(u_k^'v_k^'))
          
          = Σ_k,k^'(u_ku_k^')(v_kv_k^')
          
          = Σ_k,k^'φ(u)_k,k^'φ(v)_k,k^'

###Polynomial Kernel For degree-d polynomials, the polynomial kernel is defined as:
![polynomial kernel] (https://upload.wikimedia.org/math/e/0/e/e0e6e2ac260502f8818fb8c55cec2227.png)
where x and y are vectors in the input space and c ≥ 0 is a free parameter trading off the influence of higher-order versus lower-order terms in the polynomial.

φ: ℝⁿ⟼ ℝ^{n^p/≈p!}

When we used quadratic kernel we dropped all linear terms
No arguments about which kernel function to use as you can always use them both (add them up) Example:

 k(u,v) = (u·v + 1)²

        = (Σ_ku_kv_k + 1)(Σ_k^'u_k^'v_k^' + 1)
        
        = Σ_k,k^'u_ku_k^'v_kv_k^' + 2Σ_ku_kv_k + 1

##Gaussian Process

Gaussian Kernel:

k(u,v) = e^-d|u-v|²

Can be used to compute similarities between images
Fee for using this: maps to infinite dimensions

##Kernel Function applications

Find similarities between two pieces of text

*When we finished the optimisation above:
Problem: there were no x's left, just i's and j's

When we want to embed SVM in your system (sneeze function in camera):

ω·φ(x)⩼ θ

Σ(α_iy⁽ⁱ⁾φ(x⁽ⁱ⁾)·φ(x))
=Σ_{(s.t. α_i≠0)}(α_iy⁽ⁱ⁾k(x⁽ⁱ⁾,x))

most α_i are zero!

We only need to store the support vectors of people sneezing from the training set
Only download these into the camera:
e.g. ²⁰⁰/₁₀₀₀ training cases

##SVM Conclusions Popular kernels: quite robust
-Reasons why people like SVM instead

Positives:

Beautiful Math (Kercher's...)
SVM depends on number of support vectors
- can work in higher-dimensional space as only looks at subset of vectors
Turn Key

Criticism:

Glorified template matching

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lecture-10.md

lecture-10.md

Files

lecture-10.md

Latest commit

History

lecture-10.md

File metadata and controls