Consider the plane shown above; this ane be used to make decisions about patterns in the x-y plane. For every point in the plane x-y there is a corresponding value for z where (x,y,z) lies on the angled plane. If z is negative then the point lies above the dotted line, otherwise it is below the dotted line.

The equation for z is

*z=wx+vy+c*

We can generalise this for more dimensions

*o=w1*i1+w2*i2+w3*i3....*

*o=sum[j=1..n](wj * ij) *

o is the **output**, w are the **weights** and i
is the **input**.

This can be drawn as shown below.

Generally we don't care about the size of the output, it will be a yes/no answer for a class; so use the following:

here:

*y=f(sum(w _{i}x_{i}+theta))*

The function in the box is called the sigmoid function and is:

*f(x) = 1/(1+exp(-ax))*

If t is the expected output then there will be an error between t and o. To train the system we need to minimise this error. For a number of these functions there will be one error - the sum of the individual errors.

*E = (sum _{k}(t^{k}-y^{k})^{2})/2*

*E = ((t ^{k}-f(sum(w_{i}^{k}x_{i}^{k}+theta)))^{2})/2*

To train the classifier, just change each w so that e becomes smaller.

Now

*dE/dw _{i}=-sum_{k}(t^{k}-y^{k})*dy^{k}/dw_{i}*

*dE/dw _{i}=-sum_{k}(t^{k}-y^{k})*f
'(sum(w_{i}^{k}x_{i}^{k})+theta)x_{i}^{k}*

*dE/dw _{i}=-sum_{k}(d^{k}x_{i}^{k})*

Where

*d ^{k }= (t^{k}-y^{k})*f '(sum(w_{i}^{k}x_{i}^{k})+theta)*

*f(x) = 1/(1+exp(-ax)) *

therefore

*f '(x) = f(x)(1-f(x))*

so

*d ^{k }= (t^{k}-y^{k})*y^{k}(1-y^{k})*

Weight updating rule:

*dw _{i} = n sum_{k}(d^{k}x_{i}^{k})*

where n is a parameter used to change the speed of gradient descent. Note that we want to reduce the error so the minus sign disappears.

For theta:

*dE/dtheta=-sum _{k}(d^{k})*

therefore

*dtheta = n sum _{k}(d^{k})*

This device is sometimes called a perceptron. The training rule is called the delta rule.

repeat { setdw1=dw2=dw3=dtheta=0;for every pattern xdo { calculate^{k}ycalculate^{k }dAdd^{k }n dto^{k }x_{i}^{k }dw_{i}for i=1..n Addn dto^{k }dtheta} Adddw_{i }tow_{i }for i=1..n Adddtheta_{ }to theta } until E becomes small or changes little.

^{This perceptron can only learn simple problems.}