The convolution operation is a weighted sum of all the pixels in the neighbourhood. For example, shown below is an area in an image and a weight array:
I_{00} 
I_{10} 
I_{20} 
M_{00} 
M_{10} 
M_{20} 

I_{01} 
I_{11} 
I_{21} 
M_{01} 
M_{11} 
M_{21} 

I_{02} 
I_{12} 
I_{22} 
M_{02} 
M_{12} 
M_{22} 
If I is a 3x3 neighbourhood from the image, and M is an array the same size as I, then a pixel I'_{11} to replace I_{11} is found using.
M is called the convolution mask, and by choosing values for the elements of M, many different operations may be performed. The convolution mask may be of any size, but is commonly 3x3, 5x5 or 7x7. The sum of the elements of M is commonly either 0 or 1. In order to see why, consider an area in an image with pixels of a uniform grey level.
If the sum of the elements in M is 0 then the operator produces an image in which with all pixels are zero. Thus such operators are used to detect properties of non uniform regions, for example edges.
If the sum of the elements in M is 1 then the operator
produces an image with all pixels unchanged. Thus such operators are used
to change the pixels in an image that are in non uniform areas. They may
be used, for example, to blur or sharpen an image.
The symbol used to denote a convolution is a star 
*
If all the mask coefficients are positive, then the result
will be a blurred image.
1/9 
1/9 
1/9 
1/12 
1/12 
1/12 

1/9 
1/9 
1/9 
or 
1/12 
1/3 
1/12 
1/9 
1/9 
1/9 
1/12 
1/12 
1/12 
The mask to the left uses the average in the neighbourhood.
The mask to the right uses a weighted average. Commonly, the weighted average
is calculated using a Gaussian distribution so that there is no sudden
discontinuity in the mask.
The gradient in an image is important because it gives information about edges. The first and second derivative are often used in edge detection.
An image is a two dimensional function, thus the
gradient at any point is a vector
The gradient at any point in an image is a vector given by:
The gradient in the x direction in an image I at a point
x,y may be approximated as:
g_{x}(x,y)=I(x+0.5,y)I(x0.5,y) and, since x and y are integers
g_{x}(x0.5,y)=I(x,y)I(x1,y)
or 2g_{x}(x,y)=I(x+1,y)I(x1,y)
Similarly the gradient in the y direction is:
g_{y}(x,y0.5)=I(x,y)I(x,y1)
or 2g_{y}(x,y)=I(x,y+1)I(x,y1)
The convolution masks for this operation are:
In the x direction In the y direction
1 
0 
1 
or 
1 
1 
1 
1 

0 
or 
1 

1 
Other masks are:
Prewitt Sobel
x mask y mask x
mask y
mask
1 
0 
1 
1 
1 
1 
1 
0 
1 
1 
2 
1 

1 
0 
1 
0 
0 
0 
2 
0 
2 
0 
0 
0 

1 
0 
1 
1 
1 
1 
1 
0 
1 
1 
2 
1 
Isotropic Roberts
x mask y mask
1 
0 
1 
1 

1 
1 
0 
0 
1 


0 

0 
0 
0 
0 
1 
1 
0 

1 
0 
1 
1 

1 
The magnitude and direction of the gradient may be found using
Often, only the magnitude of the gradient is required. In this case an approximation is usually used.
This needs only a single mask
0 
1 
1 
2 
The magnitude of the gradient may be used to detect edges by thresholding. However, the width of the line in the thresholded image will be proportional to the strength of the edge. If the position of the edge is required then the second derivative can be used. Any place where the second derivative crosses zero will be on an edge.
The centre of the edge occurs at the zero crossing
of the second derivative
In one dimension, an approximation to the second derivative can be obtained by using the approximation to the first derivative and applying it again.
f'(x)= f(x+0.5)f(x0.5)
f''(x)=f'(x+0.5)f'(x0.5)
f''(x)= f(x+1)f(x)(f(x)f(x1))
f''(x)=f(x+1)2f(x)+f(x1)
In two dimensions, the second derivative is the Laplacian .
Convolution Masks that will approximate the Laplacian
or the negative Laplacian are:
0 
1 
0 
0 
1 
0 

1 
4 
1 
or 
1 
4 
1 
0 
1 
0 
0 
1 
0 
The following mask will give the average for the pixels in a 3x3 neighbourhood.
1/9 
1/9 
1/9 
1/9 
1/9 
1/9 
1/9 
1/9 
1/9 
If this is applied to an image and then the Laplacian is applied, the result will be equivalent to using the following mask.
0 
1/9 
1/9 
1/9 
0 
1/9 
2/9 
1/9 
2/9 
1/9 
1/9 
1/9 
0 
1/9 
1/9 
1/9 
2/9 
1/9 
2/9 
1/9 
0 
1/9 
1/9 
1/9 
0 
Thus, two convolutions may be replaced by one, and vice versa. This is useful when two small convolution masks can be used to replace one large mask. This is only possible for certain masks. If a mask can be split into two one dimensional masks it is said to be separable. The Laplacian operator is not separable, but a similar mask is:
1 
1 
2 
1 

2 
followed by 
1 
2 
1 
is equivalent to 
2 
4 
2 
1 
1 
2 
1 
In order to be separable each row in the mask must be a multiple of all other rows and each column must be a multiple of all other columns.
Using separable masks it is possible to use large neighbourhood sizes because the number of computations is proportional to the width of the neighbourhood and not its area.
As described earlier, the edges in an image correspond to the zero crossings of the Laplacian. These may be found by examining the pixels in a 2x2 neighbourhood, i.e.
a 
b 

d 
c 
if ac<t or ab<t or ad<t there is an edge at a 
The vale of t must be set so that noise pixels do not produce edges.
The Laplacian locates edges, but it does not differentiate between edges of different widths. In practice some edges are more important than others. For example, given a face image a single hair will generate an edge as will the border between face and background. In order to detect edges with a large width, then a larger convolution mask must be necessary.
If the image is blurred by a Gaussian then the Laplacian will only detect wide edges. These two operations are usually combined by finding the Laplacian of a Gaussian operator.
thus and
This may be extended to two dimensions and used to generate a convolution mask.
The convolution operation is a linear process. If there is a conditional step in the process of obtaining a pixel value from a neighbourhood, then it is said to be non linear.
This maximum value in the neighbourhood is chosen. The maximum operation causes light areas to expand, it is sometimes called the dilation or expand operation. It is useful for displaying data which contains isolated points and in segmentation.
This operation involves choosing the minimum in the neighbourhood. Sometimes called an erode or shrink operation, it is often used in conjunction with the maximum operator to perform fuzzy morphological operations on grey scale images.
The most commonly used non linear neighbourhood operator is the median filter.
The pixel is replaced by the median value from within the neighbourhood. The pixel values must be sorted and the middle value chosen. For example, given the following neighbourhood:
10 
20 
30 

15 
10 
40 
These values are sorted to give 
10,10,15,18,20,20,30,40,100 
100 
18 
20 
In this case the median value is 20.
The median filter is important because it removes pixels with values that are not similar to those in the neighbourhood while preserving edges. To illustrate this, consider the one dimensional case shown below.
The example to the left shows a feature that has a size less than half the width of the neighbourhood. In this case there is no point where the median value will be taken from inside the feature. Thus it is removed.
The example to the right shows a feature that has a size greater than half the width of the neighbourhood. In this case the median will always be the same as the value in the centre of the neighbourhood. This feature remains intact.
This property of the median filter is important because other noise removal methods, for example blurring, will change edge features.
The disadvantage of the median filter is than it is computationally intensive because it involves sorting. It is possible to use a moving histogram to reduce the number of calculations for large neighbourhood sizes.
Original Image 1/4 of all pixels replaced by random values
3x3 Median Filter applied once 3x3 Median Filter applied
twice
The mode filter replaces a pixel with the most common pixel value in the neighbourhood. This is only useful if there are a small number of grey levels. If grey levels are used to represent segmented regions then noise can be removed by using the mode filter.
Instead of using all the pixels in the neighbourhood use the K pixels closest in value to the central pixel. Replace it with the average of these. For example if K=6:
10 
20 
30 
The 6 pixels nearest 10 are:  
15 
10 
40 
10,15,18,20,20 and 30  
100 
18 
20 
Average = 113/6 19 
This operator performs an operation without including extreme values (the 100 in this case).
Calculate the average for a number of neighbourhoods around a pixel and use the closest to the pixel value.
n1 
n2 

n5 

n3 
n4 

Calculate the average in each neighbourhood n1..n5 and
use the closest of these to the original pixel value.
The Laplacian operation may be used to enhance an image.
f(x) f''(x) f(x)f''(x)
If the second derivative of a function is subtracted from
the function then sharp transitions in the function are emphasised as shown
above. Similarly, the Laplacian may be used to enhance the edges in an
image.
0 
0 
0 
0 
1 
0 
0 
1 
0 

0 
1 
0 
 
1 
4 
1 
= 
1 
5 
1 
0 
0 
0 
0 
1 
0 
0 
1 
0 
Image Laplacian Edge Enhancement
Original Sharpened
using Laplacian
Often it is necessary to process binary images. Since each pixel may take only two values, the pixel is primarily used to denote the presence or absence of a feature. This feature may be the result of previous processing, for example edge detection. Often the binary image gives information about the shape of an object. Neighbourhood operators for binary images are called morphological operators because they deal with shape information.
In order to use morphological operators it is necessary to decide how one pixel is connected to another pixel.
The simplest method for deciding if a pixel is joined to one of its neighbours, is to look at all eight of the neighbours. If a pixel in this 8 neighbourhood is set, then it is joined to the central pixel. Eight connectedness correctly connects diagonal lines but it also connects the background across a diagonal line. Thus when eight connectedness is used the background must be treated differently to the foreground.
Pixels are four connected, if they are joined to the left,
right, above or below, but not diagonally.
Eight Connected Figure Four Connected Figure
Two important morphological operators are shrink and expand. Shrink will clear the pixels that have any neighbours clear. Expand will set the pixels that have any neighbours set.
An expand followed by a shrink is called a Close because it will remove small gaps between objects.
A shrink followed by and expand is called an open because it will keep small gaps between objects open.
Shrink, Expand, Open and Close are illustrated below.
original expand shrink close open
Shrink and expand are analogous to the neighbourhood operators min and max.
The edges in an image may be found by taking the original away from the expanded image, or taking the shrunk image away from the original. These correspond to the outer 4 connected edge and the inner 4 connected edge respectively.
The shrink operator may be used to reduce the size of a region of set pixels but if it is repeatedly applied then the region will eventually disappear. If the shape of the region is important then rather than shrink to nothing an operator should shrink the region to its skeleton.
The skeleton (or medial axis) is a set of points which are equidistant from the two or more closest edge points in the image.
Skeletonisation may be performed by repeatedly applying an algorithm which thins the image until the skeleton remains.
Binary morphological operators may be implemented by using
a lookup table. The pixels in the neighbourhood are combined to form a
binary word. This word will be between 0 and 511, for a 3x3 neighbourhood.
This word is used to address a table which contains a new value for the
central pixel.