Training algorithm:
(a) Linearly Separable Binary Classification:
The training data for SVM is
in the form of {xi,yi} where i=1,2, ...,52 and yi ϵ {1,
+1}. In this formula xi is
the input vector. The corresponding 5*7 (6*8) grids are
applied in the form of 1*35 (1*48) vectors to the input.
In
SVM, we have some hyperplane which separates the
positive (yi = +1) from the negative (yi = 1) examples
(a separating hyperplane). The points ‘x’ which lie on
the hyperplane satisfy x ・ w +
b = 0 where
w is
normal to the hyperplane. Here,
is
the perpendicular distance from the hyperplane to the
origin, and w is
the Euclidean norm of w. Let d+(d)
be the shortest distance from the separating hyperplane
to the closest positive (negative) example. The “margin”
of a separating hyperplane is defined to be (d+)
+ (d). For the linearly separable case, the
support vector algorithm simply looks for the separating
hyperplane with largest margin. This can be formulated
as follows: suppose that all the training data satisfy
the following constraints:
These can be combined into one set of inequalities:
Now
consider the points for which the equality in Equation 1
holds (requiring that there exists such a point is
equivalent to choosing a scale for w and
b). These points lie on the hyperplane H1:
xi・ w + b =
1 with normal w and
perpendicular distance from the origin
Similarly, the points for which the equality in Equation
2 holds lie on the hyperplane H2: xi ・ w +
b = −1, with
normal again w, and perpendicular distance from the
origin
Hence (d+)
= (d)
and the margin is
simply
H1 and H2 are parallel (they
have the same normal) and no training points fall
between them. Thus we can find the pair of hyperplanes
which gives the maximum margin by minimizing
subject to constraint 1.
Thus, the solution for a typical two dimensional case
have the form shown in the following figure. Those
training points for which the equality in constraint 1
holds (i.e. those which wind up lying on one of the
hyperplanes H1, H2), and whose removal would change the
solution found, are called support vectors; they are
indicated in figure by the extra circles [14].
Figure 7: Linear separating hyperplanes for the
separable case.
To solve this minimization problem by considering the
constraint mentioned above, Lagrange formulas must be
used.
(b) NonLinearly separable Binary Classification:
When the data set is not linearly separable, a Kernel
function would be used to map the data to a higher
dimensional space (feature space). Some examples of
Kernel functions are given in the following: For this
project, I have used the linear Kernel function.
Figure 8: A Kernel function can be used to map the data
point to a higher dimensional space.
(c) MultiClass Classification:
SVMs are inherently twoclass classifiers. In
particular, the most common technique in practice has
been to build as many oneversusrest classifiers as the
number of classes (commonly referred to as
``oneversusall'' or OVA classification), and to choose
the classifier with the largest positive output. In
other words, this technique is based on building binary
classifiers which distinguish between one of the labels
and the rest (oneversusall). Classification of new
instances for the oneversusall is done by a
winnertakesall strategy, in which the classifier with
the highest output function assigns the class. Therefore
for training, K (number of classes) different binary
problems must be solved (K binary classifiers are
required) to classify “class k” versus “the rest
classes” for k = 1,2, ..., K. For this project, I have
52 classifiers (because I have 52 classes).
o Testing
algorithm:
For the testing, test sample would be assigned to the
class that gives the largest fc(x) (most
positive) value, where fc(x)
is the solution from the c’th problem (classifier).
