Lecture 17

Using data structures to store information.

When a data structure, such as a tree or linked list is used to store information, the data is split into two parts.
  1. Key Data
  2. Non Key Data
Key data is information used to order the data and non key data is the rest. The key should be chosen carefully so that it uniquely identifies the data. examples:
Key Non Key
Student ID Name,Address,Marks
Course Code Name,Department,Campus
License Plate  Make,Model,Colour
Using a data structure such a a tree, it is easy to search for data using the key. We can still perform searches based on non key data but it will be more time consuming because the whole tree must be searched.

Hash Tables

It is often the case that data must be stored so that searching is very fast. The response time of a system is often determined by the speed at which it can retrieve data.

A tree is an efficient data structure, but it is not the fastest possible method for data retrieval. The fastest method would be to use an indexed array. If the whole key is converted into a number then that number could be used as the index into an array. Searching would simply involve looking up data in the array. If there are only limited values for the key then this could be useful, unfortunately this is rarely possible.

A hash table is a combination of an indexed array and a linked list. The key is used to generate a number (called the hash value) ,this number will not be unique to every key. The function used to generate the number from the key is called the hashing function. Since the hash value is not unique there will be a number of keys for each hash value. These are stored as a linked list. The hash value is used to index an array and then the linked list for that value is searched.

For example:

A simple hash function for words may be the ASCII value of the first character. The table will have 256 entries and each linked list will be a list of words which have the same first character.

The efficiency of hashing will depend on how randomly the hash function distributes key values into hash values. Our simple example would not be very efficient because there are more words starting with 't' than with '&'. A better hash function may be to add together all the characters in the key and use this value modulus 256.

Abstract Data Types.

In 'C' we can define new types using 'typedef struct'. This only creates a part of the type - how the type is stored. Along with this we must also write functions to perform operation of the data type.

Some languages provide a more complete way of creating new types - Abstract Data Types (ADTs). An ADT comprises the storage of the type and the operations allowed on the type. Languages such as Ada,C++, Haskell and Java allow ADTs. In C we must use functions.

Example: Complex numbers:

This program contains a data structure and some functions that perform operations on it. These definitions allow us to use complex numbers without having to worry about how to perform arithmetic on them. The main program performs a fourier transform, it would be considerably longer and harder to follow without the ADT.