Lecture 10

Convolution operations

The convolution operation is a weighted sum of all the pixels in the neighbourhood. For example, shown below is an area in an image and a weight array:

I00

I10

I20

M00

M10

M20

I01

I11

I21

M01

M11

M21

I02

I12

I22

M02

M12

M22

If I is a 3x3 neighbourhood from the image, and M is an array the same size as I, then a pixel I'11 to replace I11 is found using.

M is called the convolution mask, and by choosing values for the elements of M, many different operations may be performed. The convolution mask may be of any size, but is commonly 3x3, 5x5 or 7x7. The sum of the elements of M is commonly either 0 or 1. In order to see why, consider an area in an image with pixels of a uniform grey level.

If the sum of the elements in M is 0 then the operator produces an image in which with all pixels are zero. Thus such operators are used to detect properties of non uniform regions, for example edges.

If the sum of the elements in M is 1 then the operator produces an image with all pixels unchanged. Thus such operators are used to change the pixels in an image that are in non uniform areas. They may be used, for example, to blur or sharpen an image.

The symbol used to denote a convolution is a star -

*

Convolution Masks for common operations

Blurring

If all the mask coefficients are positive, then the result will be a blurred image.

1/9

1/9

1/9

1/12

1/12

1/12

1/9

1/9

1/9

or

1/12

1/3

1/12

1/9

1/9

1/9

1/12

1/12

1/12

The mask to the left uses the average in the neighbourhood. The mask to the right uses a weighted average. Commonly, the weighted average is calculated using a Gaussian distribution so that there is no sudden discontinuity in the mask.

Gradient

The gradient in an image is important because it gives information about edges. The first and second derivative are often used in edge detection.

First derivative

An image is a two dimensional function, thus the gradient at any point is a vector

The gradient at any point in an image is a vector given by:

The gradient in the x direction in an image I at a point x,y may be approximated as:

gx(x,y)=I(x+0.5,y)-I(x-0.5,y) and, since x and y are integers

gx(x-0.5,y)=I(x,y)-I(x-1,y) or 2gx(x,y)=I(x+1,y)-I(x-1,y)

Similarly the gradient in the y direction is:

gy(x,y-0.5)=I(x,y)-I(x,y-1) or 2gy(x,y)=I(x,y+1)-I(x,y-1)

The convolution masks for this operation are:

In the x direction In the y direction

-1

0

1

or

-1

1

-1

-1

0

or

1

1

Other masks are:

Prewitt                                     Sobel
x mask       y mask         x mask              y mask

-1

0

1

-1

-1

-1

-1

0

1

-1

-2

-1

-1

0

1

0

0

0

-2

0

2

0

0

0

-1

0

1

1

1

1

-1

0

1

1

2

1

Isotropic                                  Roberts
x mask        y mask

-1

0

1

-1

-1

1

0

0

1

0

0

0

0

0

-1

-1

0

-1

0

1

1

1


The magnitude and direction of the gradient may be found using

Often, only the magnitude of the gradient is required. In this case an approximation is usually used.

This needs only a single mask

0

-1

-1

2

Second derivative

The magnitude of the gradient may be used to detect edges by thresholding. However, the width of the line in the thresholded image will be proportional to the strength of the edge. If the position of the edge is required then the second derivative can be used. Any place where the second derivative crosses zero will be on an edge.

The centre of the edge occurs at the zero crossing of the second derivative

Laplacian

In one dimension, an approximation to the second derivative can be obtained by using the approximation to the first derivative and applying it again.

f'(x)= f(x+0.5)-f(x-0.5)

f''(x)=f'(x+0.5)-f'(x-0.5)

f''(x)= f(x+1)-f(x)-(f(x)-f(x-1))

f''(x)=f(x+1)-2f(x)+f(x-1)

In two dimensions, the second derivative is the Laplacian -.

Convolution Masks that will approximate the Laplacian or the negative Laplacian are:

0

1

0

0

-1

0

1

-4

1

or

-1

4

-1

0

1

0

0

-1

0

Combining convolution masks

The following mask will give the average for the pixels in a 3x3 neighbourhood.

1/9

1/9

1/9

1/9

1/9

1/9

1/9

1/9

1/9

If this is applied to an image and then the Laplacian is applied, the result will be equivalent to using the following mask.

0

-1/9

-1/9

-1/9

0

-1/9

2/9

1/9

2/9

-1/9

-1/9

1/9

0

1/9

-1/9

-1/9

2/9

1/9

2/9

-1/9

0

-1/9

-1/9

-1/9

0

Thus, two convolutions may be replaced by one, and vice versa. This is useful when two small convolution masks can be used to replace one large mask. This is only possible for certain masks. If a mask can be split into two one dimensional masks it is said to be separable. The Laplacian operator is not separable, but a similar mask is:

-1

1

-2

1

2

followed by

-1

2

-1

is equivalent to

-2

4

-2

-1

1

-2

1

In order to be separable each row in the mask must be a multiple of all other rows and each column must be a multiple of all other columns.

Using separable masks it is possible to use large neighbourhood sizes because the number of computations is proportional to the width of the neighbourhood and not its area.

Edge Detection Using the Laplacian

The zero crossings of Laplacian

As described earlier, the edges in an image correspond to the zero crossings of the Laplacian. These may be found by examining the pixels in a 2x2 neighbourhood, i.e.

a

b

d

c

if ac<t or ab<t or ad<t there is an edge at a

The vale of t must be set so that noise pixels do not produce edges.

Laplacian of a Gaussian

The Laplacian locates edges, but it does not differentiate between edges of different widths. In practice some edges are more important than others. For example, given a face image a single hair will generate an edge as will the border between face and background. In order to detect edges with a large width, then a larger convolution mask must be necessary.

If the image is blurred by a Gaussian then the Laplacian will only detect wide edges. These two operations are usually combined by finding the Laplacian of a Gaussian operator.

thus and

This may be extended to two dimensions and used to generate a convolution mask.

Non Linear Neighbourhood Operations.

The convolution operation is a linear process. If there is a conditional step in the process of obtaining a pixel value from a neighbourhood, then it is said to be non linear.

Maximum

This maximum value in the neighbourhood is chosen. The maximum operation causes light areas to expand, it is sometimes called the dilation or expand operation. It is useful for displaying data which contains isolated points and in segmentation.

Minimum

This operation involves choosing the minimum in the neighbourhood. Sometimes called an erode or shrink operation, it is often used in conjunction with the maximum operator to perform fuzzy morphological operations on grey scale images.

Median Filter

The most commonly used non linear neighbourhood operator is the median filter.

The pixel is replaced by the median value from within the neighbourhood. The pixel values must be sorted and the middle value chosen. For example, given the following neighbourhood:

10

20

30

15

10

40

These values are sorted to give

10,10,15,18,20,20,30,40,100

100

18

20

In this case the median value is 20.

The median filter is important because it removes pixels with values that are not similar to those in the neighbourhood while preserving edges. To illustrate this, consider the one dimensional case shown below.


The example to the left shows a feature that has a size less than half the width of the neighbourhood. In this case there is no point where the median value will be taken from inside the feature. Thus it is removed.

The example to the right shows a feature that has a size greater than half the width of the neighbourhood. In this case the median will always be the same as the value in the centre of the neighbourhood. This feature remains intact.

This property of the median filter is important because other noise removal methods, for example blurring, will change edge features.

The disadvantage of the median filter is than it is computationally intensive because it involves sorting. It is possible to use a moving histogram to reduce the number of calculations for large neighbourhood sizes.

Example of the Median Filter

Original Image 1/4 of all pixels replaced by random values

3x3 Median Filter applied once 3x3 Median Filter applied twice

The Mode Filter

The mode filter replaces a pixel with the most common pixel value in the neighbourhood. This is only useful if there are a small number of grey levels. If grey levels are used to represent segmented regions then noise can be removed by using the mode filter.

K Nearest Neighbour

Instead of using all the pixels in the neighbourhood use the K pixels closest in value to the central pixel. Replace it with the average of these. For example if K=6:

10

20

30

The 6 pixels nearest 10 are:

15

10

40

10,15,18,20,20 and 30

100

18

20

Average = 113/6 19

This operator performs an operation without including extreme values (the 100 in this case).

Maximum Homogenity

Calculate the average for a number of neighbourhoods around a pixel and use the closest to the pixel value.

n1

n2

n5

n3

n4

Calculate the average in each neighbourhood n1..n5 and use the closest of these to the original pixel value.

Image Enhancement using the Laplacian

The Laplacian operation may be used to enhance an image.

f(x) f''(x) f(x)-f''(x)

If the second derivative of a function is subtracted from the function then sharp transitions in the function are emphasised as shown above. Similarly, the Laplacian may be used to enhance the edges in an image.

0

0

0

0

1

0

0

-1

0

0

1

0

-

1

-4

1

=

-1

5

-1

0

0

0

0

1

0

0

-1

0

Image Laplacian Edge Enhancement

Example

Original                    Sharpened using Laplacian

Binary neighbourhood operators

Often it is necessary to process binary images. Since each pixel may take only two values, the pixel is primarily used to denote the presence or absence of a feature. This feature may be the result of previous processing, for example edge detection. Often the binary image gives information about the shape of an object. Neighbourhood operators for binary images are called morphological operators because they deal with shape information.

Connectivity

In order to use morphological operators it is necessary to decide how one pixel is connected to another pixel.

Eight Connected

The simplest method for deciding if a pixel is joined to one of its neighbours, is to look at all eight of the neighbours. If a pixel in this 8 neighbourhood is set, then it is joined to the central pixel. Eight connectedness correctly connects diagonal lines but it also connects the background across a diagonal line. Thus when eight connectedness is used the background must be treated differently to the foreground.

Four Connected

Pixels are four connected, if they are joined to the left, right, above or below, but not diagonally.

                    

Eight Connected Figure    Four Connected Figure

Shrink and Expand

Two important morphological operators are shrink and expand. Shrink will clear the pixels that have any neighbours clear. Expand will set the pixels that have any neighbours set.

Close and Open

An expand followed by a shrink is called a Close because it will remove small gaps between objects.

A shrink followed by and expand is called an open because it will keep small gaps between objects open.

Shrink, Expand, Open and Close are illustrated below.

original expand shrink close open

Shrink and expand are analogous to the neighbourhood operators min and max.

The edges in an image may be found by taking the original away from the expanded image, or taking the shrunk image away from the original. These correspond to the outer 4 connected edge and the inner 4 connected edge respectively.

Skeletonisation.

The shrink operator may be used to reduce the size of a region of set pixels but if it is repeatedly applied then the region will eventually disappear. If the shape of the region is important then rather than shrink to nothing an operator should shrink the region to its skeleton.

The skeleton (or medial axis) is a set of points which are equidistant from the two or more closest edge points in the image.

Skeletonisation may be performed by repeatedly applying an algorithm which thins the image until the skeleton remains.

Implementation

Binary morphological operators may be implemented by using a lookup table. The pixels in the neighbourhood are combined to form a binary word. This word will be between 0 and 511, for a 3x3 neighbourhood. This word is used to address a table which contains a new value for the central pixel.