So far we have always known the correct classification for every member of the training set, this is not always the case.
Patterns are often created by processes we are not able to describe well. In these cases we can try to build a recogniser based on inherent properties of the patterns.
If we take speech signals and transform them to the frequency domain we get a speech spectrogram.
The spectrogram is a two dimensional signal showing how the frequencies change over time.
For human speech we find that speech consists of two of three bands called formants. We could try to classify speech by using the position of the formants. (this is not easy because there is a wide variation in way the formants change for a particular sound).
Since there is some inherent structure in the signal we can use unsupervised learning to reduce the quantity of information in the formants.
We need to construct a classifier that minimises not the error between output and target, but the error in the fit of a particular model to the data.
If we use a very complex model it will be able to fit the data quite well.
An example model is a set of points on the pattern space, each point represents the centre of a cluster. An algorithm must be devised to move the points so that an error is reduced.
The K means algorithm:
Assume that there are K clusters in the data, i.e. K classes.
where d is some distance metric.
i.e. y is in cluster i, if i has the closest mean.
For the Euclidean distance this is just the mean of all the patterns assigned to class i.
Repeat from 2 until
i.e. the clusters are not moving much.
Clustering is applicable to problems where
ISODATA is an algorithm based on K means that does not need to know the number of clusters. It starts with one cluster and splits/merges clusters according to certain parameters.
Amongst many others. To modify the K mans algorithm we need to add an extra step 4 which can split or merge clusters.
Clustering algorithms perform dimensionality reduction. They do this because they take a high dimensional pattern space and produce a lower dimensional space. Hopefully the lower dimensional space still contains most of the information from the original space.