Skip to content

Big Picture

We will now wind down a little bit to prepare for the break and for the second half of the semester.

We have created a simple processor with both the datapath and control. The operations are rather complex. To make matter worse, some operations are looped back (e.g., writing back to register).

This means that a single instruction will have to perform the following execution:

  1. Read contents of one or more storage elements (i.e., register/memory).
  2. Perform computation through some combinational logic.
  3. Write results to one or more storage elements (i.e., register/memory).

Single Cycle

A naive implementation is to have all of the execution above within a single clock period. This means that we have to split a single clock cycle into 3 parts. One possibilities is as follows to prevent reading storage element when it is being written.

Single Period

This leads to certain problems --or rather, shortcomings. The main problem is that the clock speed has to accommodate the slowest instruction. Consider the following example:

Single Cycle Clock Speed

Assuming negligible delays, we consider the following clock speed for each component:

  • Memory: 2ns
  • ALU/Adder: 2ns
  • Register: 1ns

We can then find how long does it take to perform each instruction:

Instruction Inst. Mem Reg. Read ALU Data Mem Reg. Write Total
R-Format 2 1 2 1 6ns
lw 2 1 2 2 1 8ns
sw 2 1 2 2 7ns
beq 2 1 2 5ns

Since all instructions take as much time as the slowest one, in this case that is the lw instruction. Which means, all instruction must take 8ns. This means there is a long cycle time for each instruction.

Solution

There are a few possible solutions for this. We will discuss a few and one of them will be discussed in more details in the second half of the semester.

Multicycle

The first solution is to break up the instruction into execution steps. The simplest one is really to just break it up to the same execution steps as the stage. So we have 5 steps for one instruction. What this means is that each instruction takes up to 5 execution steps where each execution step is 1 clock cycle.

The advantage is that each clock cycle is smaller (i.e., faster) but each instruction takes more clock cycle to execute. This may be advantageous if each instructions can take variable number of clock cycles to complete.

Pipelining

Similar to multicycle, but we take further steps to optimise. Consider what happens when we are currently executing the ALU stage. Note that the instruction memory is now idle! That is a waste of perfectly good component.

So, what can we do? If all computations are always sequential, we can always try to fetch and decode the next instruction. This is the essence of pipelining. Components that are normally idle will be utilised.

However, this leads to complications which we will discuss in the second half of the semester.