

## **Chapter Roadmap**

- How to design an ALU?
- MIPS datapath
- Pipelining
- Hazards
- Real stuff



### Introduction

- CPU performance factors
  - Instruction count
    - Determined by ISA and compiler
  - CPI and Cycle time
    - Determined by CPU hardware
- We will examine two MIPS implementations
  - A simplified version
  - A more realistic pipelined version
- Simple subset, shows most aspects
  - Memory reference: I w, sw
  - Arithmetic/logical: add, sub, and, or, sI t
  - Control transfer: beq, j





### **Possible Representations**

| Sign Mag. | Two's Comp. | One's Comp. |  |
|-----------|-------------|-------------|--|
|           | 1000 = -8   |             |  |
| 1111 = -7 | 1001= -7    | 1000 = -7   |  |
| 1110 = -6 | 1010 = -6   | 1001 = -6   |  |
| 1101 = -5 | 1011 = -5   | 1010 = -5   |  |
| 1100 = -4 | 1100 = -4   | 1011 = -4   |  |
| 1011 = -3 | 1101 = -3   | 1100 = -3   |  |
| 1010 = -2 | 1110 = -2   | 1101 = -2   |  |
| 1001 = -1 | 1111 = -1   | 1110 = -1   |  |
| 1000 = -0 |             | 1111 = -0   |  |
| 0000 = +0 | 0000 = 0    | 0000 = +0   |  |
| 0001 = +1 | 0001 = +1   | 0001 = +1   |  |
| 0010 = +2 | 0010 = +2   | 0010 = +2   |  |
| 0011 = +3 | 0011 = +3   | 0011 = +3   |  |
| 0100 = +4 | 0100 = +4   | 0100 = +4   |  |
| 0101 = +5 | 0101 = +5   | 0101 = +5   |  |
| 0110 = +6 | 0110 = +6   | 0110 = +6   |  |
| 0111 = +7 | 0111 = +7   | 0111 = +7   |  |

- Issues:
  - balance
  - number of zeros
  - ease of operations
- Which one is best? Why?

## **MIPS** Representations

```
32-bit signed numbers (2's complement):
```

What if the bit string represented addresses?

need operations that also deal with only positive (unsigned) integers

### **Two's Complement Operations**

- Negating a two's complement number complement all the bits and then add a 1
  - remember: "negate" and "invert" are quite different!
  - Starting from LSb, all 0's as is, first 1 as is, then invert
- Converting n-bit numbers into numbers with more than n bits:
  - MIPS 16-bit immediate gets converted to 32 bits for arithmetic
  - sign extend copy the most significant bit (the sign bit) into the other bits

sign extension versus zero extend (1b vs. 1bu)



#### **Addition & Subtraction**

Just like in grade school (carry/borrow 1s)

- Two's complement operations are easy
  - do subtraction by negating and then adding

$$\begin{array}{ccc}
0111 & \to & 0111 \\
- & 0110 & \to & + 1010 \\
\hline
0001 & & 1 & 0001
\end{array}$$

- Overflow (result too large for finite computer word)
  - e.g., adding two n-bit numbers does not yield an n-bit number
     0111

### **Building a 1-bit Binary Adder**



| Α | В | carry_in | carry_out | S |
|---|---|----------|-----------|---|
| 0 | 0 | 0        | 0         | 0 |
| 0 | 0 | 1        | 0         | 1 |
| 0 | 1 | 0        | 0         | 1 |
| 0 | 1 | 1        | 1         | 0 |
| 1 | 0 | 0        | 0         | 1 |
| 1 | 0 | 1        | 1         | 0 |
| 1 | 1 | 0        | 1         | 0 |
| 1 | 1 | 1        | 1         | 1 |

S = A xor B xor carry\_in carry\_out = A&B | A&carry\_in | B&carry\_in (majority function)

- □ How can we use it to build a 32-bit adder?
- How can we modify it easily to build an adder/subtractor?

# **Building 32-bit Adder**



- □ Just connect the carry-out of the least significant bit FA to the carry-in of the next least significant bit and connect . . .
  - □ Ripple Carry Adder (RCA)
  - advantage: simple logic, so small (low cost)
    - disadvantage: slow and lots of glitching (so lots of energy consumption)

#### A 32-bit Ripple Carry Adder/Subtractor add/sub $c_0$ =carry\_in □ Remember 2's 1-bit complement is just FΑ complement all the bits C<sub>1</sub> control 1-bit (0=add,1=sub) FΑ $B_0$ if control = 0, B<sub>1</sub> $!B_0$ if control = 1 $C_2$ 1-bit FΑ add a 1 in the least $C_3$ significant bit 0111 C<sub>31</sub> - 0110 1-bit FΑ ↓ c<sub>32</sub>=carry\_out



#### **Overflow Detection and Effects**

- Overflow: the result is too large to represent in the number of bits allocated
- When adding operands with different signs, overflow cannot occur! Overflow occurs when
  - adding two positives yields a negative
  - or, adding two negatives gives a positive
  - or, subtract a negative from a positive gives a negative
  - or, subtract a positive from a negative gives a positive
- On overflow, an exception (interrupt) occurs
  - Control jumps to predefined address for exception
  - Interrupted address (address of instruction causing the overflow) is saved for possible resumption
- Don't always want to detect (interrupt on) overflow

















| s2 | s1 | s0 | c_in | result    | function         |
|----|----|----|------|-----------|------------------|
| 0  | 0  | 0  | 0    | А         | transfer A       |
| 0  | 0  | 0  | 1    | A + 1     | increment A      |
| 0  | 0  | 1  | 0    | A + B     | add              |
| 0  | 0  | 1  | 1    | A + B + 1 | add with carry   |
| 0  | 1  | 0  | 0    | A – B – 1 | subt with borrow |
| 0  | 1  | 0  | 1    | A – B     | subtract         |
| 0  | 1  | 1  | 0    | A – 1     | decrement A      |
| 0  | 1  | 1  | 1    | Α         | transfer A       |
| 1  | 0  | 0  | х    | A or B    | or               |
| 1  | 0  | 1  | Х    | A xor B   | xor              |
| 1  | 1  | 0  | Х    | A and B   | and              |
| 1  | 1  | 1  | Х    | !A        | complement A     |



#### **Overflow Detection**

- Overflow occurs when the result is too large to represent in the number of bits allocated
  - adding two positives yields a negative
  - or, adding two negatives gives a positive
  - or, subtract a negative from a positive gives a negative
  - or, subtract a positive from a negative gives a positive
- On your own: Prove you can detect overflow by:
  - Carry into MSB xor Carry out of MSB
  - + 0 0 1 1 3
- 1 1 0 0 -4 + 1 0 1 1 -5





#### **Instruction Execution**

- PC → instruction memory, fetch instruction
- All but jump, need to access data from a register and use the ALU.
- Depending on instruction class
  - Use ALU to calculate
    - Arithmetic result
    - Memory address for load/store
    - Branch target address beq \$r1, \$r2, LOC
  - Access data memory for load/store
  - PC ← target address or PC + 4









# **Logic Design Basics**

- Information encoded in binary
  - Low voltage = 0, High voltage = 1
  - One wire per bit
  - Multi-bit data encoded on multi-wire buses
- Combinational element
  - Operate on data
  - Output is a function of input
- State (sequential) elements
  - Store information



Chapter 4 — The Processor — 29

#### **Combinational Elements**

- AND-gateAdder

- Multiplexer
  - Y = S ? I1 : I0

$$\begin{array}{c}
10 \longrightarrow M \\
11 \longrightarrow X
\end{array}$$



- Arithmetic/Logic Unit
  - Y = F(A, B)





## **Sequential Elements**

- Register: stores data in a circuit
  - Uses a clock signal to determine when to update the stored value
  - Edge-triggered: update when Clk changes from 0 to 1







Chapter 4 — The Processor — 31

# **Sequential Elements**

- Register with write control
  - Only updates on clock edge when write control input is 1
  - Used when stored value is required later







## **Clocking Methodology**

- Combinational logic transforms data during clock cycles
  - Between clock edges
  - Input from state elements, output to state element
  - Longest delay determines clock period



# **Building a Datapath**

- Datapath
  - Elements that process data and addresses in the CPU
    - Registers, ALUs, mux's, memories, ...
- We will build a MIPS datapath incrementally
  - Refining the overview design







#### **Load/Store Instructions**

- Read register operands
- Calculate address using 16-bit offset
  - Use ALU, but sign-extend offset
- Load: Read memory and update register
- Store: Write register value to memory



#### **Branch Instructions**

- Read register operands
- Compare operands
  beq \$t1, \$t2, offset
  - Use ALU, subtract and check Zero output
- Calculate target address
  - Sign-extend displacement
  - Shift left 2 places (word displacement)
  - Add to PC + 4
    - Already calculated by instruction fetch





# **Composing the Elements**

- First-cut data path does an instruction in one clock cycle
  - Each datapath element can only do one function at a time
  - Hence, we need separate instruction and data memories
- Use multiplexers where alternate data sources are used for different instructions







# **ALU Control**

ALU used for

Load/Store: F = add

■ Branch: F = subtract

R-type: F depends on funct field

| ALU control | Function         |
|-------------|------------------|
| 0000        | AND              |
| 0001        | OR               |
| 0010        | add              |
| 0110        | subtract         |
| 0111        | set-on-less-than |
| 1100        | NOR              |

