Large memories DRAM (dynamic ram) are slow ~100ns
compared to our pipeline time of ~10ns
Small memories SRAM (static ram) are fast ~10ns
but are ~10 times as expensive.
Memory is built using a hierarchy,
disk being the slowest and cheapest
then SRAM the fastest and most expensive.
A library is a good example of a memory hierarchy.
It is very important to have access to all books in a library, but at any one time you will only need a few books on one particular subject, and of those books maybe a few pages in each.
If you leave the books open, on your desk, at these particular pages, you have very fast access to the information.
If you are prepared to turn the pages you get slower access to other information in the book.
If you are prepared to go to the library you get even slower access to lots more information.
The same principle applies to memory.
The Principle of Locality states that programs access a relatively small portion of their address space at any instant of time.
Temporal locality - if an item is referenced, it will tend to be referenced again soon
Spatial locality - if an item is referenced, items whose addresses are close by will tend to be referenced soon.
A Hit is when the processor requests data that is in the faster memory (cache).
Hit Rate is the percentage of accesses that are found in the cache.
Hit Time is the time taken to access the cache.
A Miss is when the processor requests data that is not in the cache.
Miss Rate is (1 - Hit Rate).
Miss Penalty is the time taken to move data from memory to cache.
The Block Size is the amount of data that is transferred from memory to cache when a miss occurs.
Typical values are:
Block size 4 to 128 bytes
Hit time 1 cycle
Miss Penalty 8 cycles
Miss Rate 10%
Cache Size 256KB
When the processor wants to access data x5:
It first decides whether it is already in the cache.
If it isn’t it retrieves x5 from memory and inserts it in the cache.
How does the processor know if the data is in the cache?
How does the processor know its position in the cache?
The cache is ordered so that there is a relation between memory addresses and cache structure.
The position or index of a data item in the cache is determined directly from its memory address.
Consider a cache of 8 entries each entry 32 bits wide.
Direct mapping will place data from memory into locations in the cache at positions corresponding to the low-order 3 bits of their memory address:
location data cache index
000000 23 000
000001 44 001
000010 13 010
000011 48 011
000100 17 100
000101 56 101
000110 44 110
000111 22 111
001000 39 000
010101 29 101
111111 33 111
the 3 low order bits give the position in the cache.
When the processor is first switched on, the cache will not contain valid data. So a valid bit is associated with each cache location, and is set when data is written.
Lots of memory locations are mapped to the same cache location.
We also need to know which location is currently in the cache.
We do this with a cache tags.
The cache tag is the remaining bits of the address:
cache cache cache corresponding
tag index data memory
000 000 23 000000
100 001 28 100001
011 010 34 011010
000 011 48 000011
000 100 17 000100
010 101 29 010101
100 110 92 100110
111 111 33 111111
To discover whether a data reference hits or misses the cache:
1 find the cache index of the address
2. match the cache tag with the cache tag part of the address
cache index is 110 -> cache tag is 100
cache tag part of 100110 is 100
100 == 100 -> hit
cache index is 011 -> cache tag is 000
cache tag part of 010011 is 010
000 != 010 -> miss
To increase the size of the cache we can either increase the number of positions, or increase the size of the data stored at those positions
Example of a cache with 32 entries each 32 bytes of data:
Increasing the number of bytes for each cache entry involves a penalty when there is a miss. More bytes need to be copied from memory into the cache. Spatial locality means that this is not necessarily a bad thing!
Using the cache index to find the correct data line before comparing the cache tag involves two levels of computation.
A fully associative cache combines the cache index with the cache tag.
To find out whether the data is in the cache:
compare the cache tag part of the memory address with all the cache tags in the cache.
This matching is done in parallel in a fully associative cache.
In a directly mapped cache:
If there is a cache miss then the cache is updated and the new entry replaces the old.
In a fully associative cache:
If there is a cache miss then the new entry can be placed anywhere.
Where do we put it?
Using the Least Recently Used algorithm - hardware keeps track of which cache entry has not been accessed for the longest time.
This can be tricky to implement!
A simple pseudo LRU algorithm is:
Keep an index.
If the cache at that value is accessed add one to the index.
If a replacement is needed choose the cache entry at the index.
Reading cache is straight forward. Writing to cache is not.
How do we keep cache data and memory data the same?
Write Back policy:
Keep an extra bit in each cache entry to mark whether the entry has been written to.
When the cache entry is replaced, write it back to memory if it has changed.
Write Through policy:
Always write to memory as well as to cache.
Use a write buffer between memory to speed up memory access.
Hardware 20 -