Skip to content

Latest commit

 

History

History
151 lines (111 loc) · 6.05 KB

lecture-10.md

File metadata and controls

151 lines (111 loc) · 6.05 KB

20/10/2015

#Support Vector Machine (SVM) Last Lecture: Maximum margin problems

  • Introduction of soft margin
    When there is mislabelled data; a hyperplane is introduced to cleanly split data and maximise margin distance

  • Problem based on linear classification
    Margin is straightforward to calculate in linear case but not when the problem's non-linear:

![Non-linear margin] (http://www.blaenkdenum.com/images/notes/machine-learning/support-vector-machines/x-space-non-linear-svm.png)

##Kernel Trick

  • Map observations to a higher dimensional space using a Kernel Function:
    Kernel Function
    e.g.
φ: ℝ106⟼ ℝ10100
  • φ(x) - Intractable (heard to calculate by itself)
    Analogy - GPU input vector which you cannot alter once it is being processed

##Kernelize the Algorithm

  • Instead of operating in input space - change x's to φ(xi)'s to move to feature space

ω: primal/single representation of the vector
α: dual representation of the vector
ξi: slack parameter
ω,θ,ξ - penetration variables (penetrate margins)
y(i) = ±1 for either "yes"/"no" class of data points

Minimise(ω,θ,ξ)  
              1/2‖ω‖2 + cΣii)
Subject to:  
              y(i)(ω·x(i) - θ) ≥ 1 - ξi
And:  
              ξi ≥ 0

Math Breakdown:

 ω= Σiiy(i)φ(x(i))
  
‖ω‖2 = ω·ω = (Σiiy(i)φ(x(i)))·(Σjjy(j)φ(x(j)))
    
          = Σi,jiαjy(i)y(j)(φ(x(i))φ(x(j))))
                
          = Σi,jiαjy(i)y(j)k(x(i),x(j)))
                
          = αT(y(i)y(j)ki,j)i,j
Substitute ω·φ(x(i))

           = Σjjy(j)φ(x(j))φ(x(i)))
          
           = Σjjy(j)k(x(i),x(j)))

###Kernelized Algorithm

Minimise(α,θ,ξ)  
              1/2αT(y(i)y(j)ki,j)i,j + cΣii)
Subject to:  
              y(i)jjy(j)k(x(i),x(j))) - θ) ≥ 1 - ξi
And:  
              ξi ≥ 0
  • Dual representation (α) used for quadratic programming problems as opposed to primal representation (ω)

##Kernel Function ####Mercer's Theorem Pre-condition:
If k is symmetric:

k(u, v) = k(v, u)

,non-negative definite:
![non-negative definite kernel] (https://upload.wikimedia.org/math/7/9/e/79e0f0a14643312d46347a004e688ef7.png)
for all finite sequences of points x1,..., xn of [a, b] and all choices of real numbers c1,..., cn
Post-condition:

⇒ ∃ φ s.t. k(u,v)=φ(u)·(v)

Examples:
Identity Kernel:

 k(u,v)=u·v
  • takes O(n) work in n-space
 k(u,v) = (u·v)2

          = (Σk(ukvk))2
          
          = (Σk(ukvk))(Σk'(uk'vk'))
          
          = Σk,k'(ukuk')(vkvk')
          
          = Σk,k'φ(u)k,k'φ(v)k,k'

###Polynomial Kernel For degree-d polynomials, the polynomial kernel is defined as:
![polynomial kernel] (https://upload.wikimedia.org/math/e/0/e/e0e6e2ac260502f8818fb8c55cec2227.png)
where x and y are vectors in the input space and c ≥ 0 is a free parameter trading off the influence of higher-order versus lower-order terms in the polynomial.

φ: ℝn⟼ ℝnp/≈p!
  • When we used quadratic kernel we dropped all linear terms

  • No arguments about which kernel function to use as you can always use them both (add them up) Example:

 k(u,v) = (u·v + 1)2

        = (Σkukvk + 1)(Σk'uk'vk' + 1)
        
        = Σk,k'ukuk'vkvk' + 2Σkukvk + 1

##Gaussian Process

  • Gaussian Kernel:
k(u,v) = e-d|u-v|2
  • Can be used to compute similarities between images
  • Fee for using this: maps to infinite dimensions

##Kernel Function applications

  • Find similarities between two pieces of text

*When we finished the optimisation above:
Problem: there were no x's left, just i's and j's

When we want to embed SVM in your system (sneeze function in camera):

ω·φ(x)⩼ θ

Σ(αiy(i)φ(x(i))·φ(x))
=Σ(s.t. αi≠0)iy(i)k(x(i),x))

most αi are zero!

We only need to store the support vectors of people sneezing from the training set
Only download these into the camera:
e.g. 200/1000 training cases

##SVM Conclusions Popular kernels: quite robust
-Reasons why people like SVM instead

Positives:

  • Beautiful Math (Kercher's...)
  • SVM depends on number of support vectors
    • can work in higher-dimensional space as only looks at subset of vectors
  • Turn Key

Criticism:

  • Glorified template matching