Lecture 5

Pattern Recognition

Classification

Feature Extraction:

Classification Training and Testing Over-training and Under-training.

It is easy to learn about the training set, just remember them all, unfortunately this will be of little use for classification of the test set - this is over-training. Similarly If the system knows too little about the training set then it will classify badly.

The system must not overgeneralise or undergeneralise about patterns.

Bayesian Classification

For independent events:

Bayes Rule

so For pattern classification:

if x is the feature vector and wi is class i.

Pr(wi) is the prior probability of the pattern being in class i.

p(x | wi) is the probability density function for the feature vector.
p(x) = sum(p(x | wi) Pr(wi))

From Bayes Rule:

where: The optimal decision is the i for which Pr(wi | x) is largest.

Bayesian theory gives us the optimum decision boundary but in practice we don't have enough information to calculate this boundary.

Nearest Neighbour Classification

A pattern is classified as belonging to the class of the training pattern that is closest to it. To measure closeness use a distance metric.

For a feature vector x = {x1, x2, x3,.....,xn}

and a training pattern t = {t1, t2, t3,.....,tn}

  1. Euclidean distance:

  2. D2=Sumi( (xi - ti)2)
  3. Dot Product Distance:

  4. D = Sumi(xi * ti) /(|xi|*|ti|)
  5. Angle between vectors:

  6. D = Sumi(xi * ti) /(|xi|*|ti|)

Efficiency

Optimality

Nearest neighbour classification is prone to errors due to rogue patterns. A rogue pattern is a mis-labeled pattern.

K Nearest Neighbour

To eliminate the problem caused by rogue patterns use not just the nearest neighbour but a group of them.

Using K neighbours, take a majority vote of their classes to give a classification.

Optimality

Speeding up nearest neighbour methods.

The biggest problem with this method is the time it takes to calculate the distances to the training examples possible solutions are: