# Static vs. Dynamic Scheduling

- Dynamic Scheduling
  - Fast
  - Requires complex hardware
  - More power consumption
  - May result in a slower clock
- Static Scheduling
  - Done in S/W (compiler)
  - Maybe not as fast
  - Simpler processor design (less complex)



- In Simple pipelines, instructions are issued in order.
- If an instruction stalls, all instructions after it are stalled too (could be O.K. to execute).
- DIV.D F0,F2,F4
- ADD.D F10,F0,F8
- SUB.D F12,F8,F14



 Rearrange order of instructions to reduce stalls while maintaining data flow

## Advantages:

- Compiler doesn't need to have knowledge of microarchitecture
- Handles cases where dependencies are unknown at compile time

## Disadvantage:

- Substantial increase in hardware complexity
- Complicates exceptions



- Rearrange order of instructions to reduce stalls while maintaining data flow
- Instructions are issued in program order
- But, the instruction begins execution as soon as its operand are ready
- Out of order execution → out of order completion
- DIV.D F0,F2,F4
- ADD.D F6,F0,F8
- SUB.D F8,F10,F14
- MUL.D F6,F10,F8

Antidependence

Output Dependence



- To allow dynamic scheduling, split the ID stage in the simple MIPS pipeline into 2 stages
  - Issue: Decode and check for structural hazards
  - Read operand: wait for data hazard → read operand
- Instruction fetch stage before issue, and execution starts after read operand.
- Instructions pass through the issue stage in order, they can be delayed or pass each other at the read operand stage.



- Major complication for exception handling.
- Must preserve the exception behavior as if the instructions are executed in the program order.
- May delay notification until the processor knows the instruction is the next one completed.
- Imprecise exception may occur
  - Later instructions (in program order) may have been completed already.
  - Earlier instructions may have not been completed



# Register Renaming

Example:

```
DIV.D F0,F2,F4

ADD.D F6,F0,F8

S.D F6,0(R1)

SUB.D F8,F10,F14

antidependence

MUL.D F6,F10,F8
```

+ name dependence with F6

# Register Renaming

## Example:

```
      DIV.D
      F0,F2,F4

      ADD.D
      F6,F0,F8

      S.D
      F6,0(R1)

      SUB.D
      F8,F10,F14

      MUL.D
      F6,F10,F8

      DIV.D
      F0,F2,F4

      ADD.D
      S,F0,F8

      S.D
      S,0(R1)

      SUB.D
      T,F10,F14

      MUL.D
      F6,F10,T
```

Now only RAW hazards remain, which can be strictly ordered

# Register Renaming

- Register renaming is provided by reservation stations (RS)
  - Contains:
    - The instruction
    - Buffered operand values (when available)
    - Reservation station number of instruction providing the operand values
  - RS fetches and buffers an operand as soon as it becomes available (not necessarily involving register file)
  - Pending instructions designate the RS to which they will send their output
    - Result values broadcast on a result bus, called the common data bus (CDB)
  - Only the last output updates the register file
  - As instructions are issued, the register specifiers are renamed with the reservation station
  - May be more reservation stations than registers



Load and store buffers

Contain data and addresses, act like reservation

stations

Top-level design:



From instruction unit



### Three Steps:

- Issue
  - Get next instruction from FIFO queue
  - If available RS, issue the instruction to the RS with operand values if available
  - If no RS is available, stall the instruction issue

#### Execute

- When operand becomes available, store it in any reservation stations waiting for it
- When all operands are ready, issue the instruction
- Loads and store maintained in program order through effective address
- No instruction allowed to initiate execution until all branches that proceed it in program order have completed
- Write result
  - Write result on CDB into reservation stations and store buffers
    - (Stores must wait until address and value are received)



Op: Operation to perform in the unit (e.g., + or –)

Vj, Vk: Value of Source operands

Store buffers has V field, result to be stored

Qj, Qk: Reservation stations producing source registers (value to be written)

- Note: Qj,Qk=0 → ready
- Store buffers only have Qj for RS producing result

A: Used to hold info for the load store (initially immediate, then effective address)

Busy: Indicates reservation station or FU is busy

Register result status— Qi indicates which functional unit will write each register, 0 means no write to this register



# **Example**

|             |           | Instruction status |         |              |  |  |  |  |  |
|-------------|-----------|--------------------|---------|--------------|--|--|--|--|--|
| Instruction |           | Issue              | Execute | Write Result |  |  |  |  |  |
| L.D         | F6,32(R2) |                    | √       | √            |  |  |  |  |  |
| L.D         | F2,44(R3) | V                  | √       |              |  |  |  |  |  |
| MUL.D       | F0,F2,F4  | $\sqrt{}$          |         |              |  |  |  |  |  |
| SUB.D       | F8,F2,F6  | V                  |         |              |  |  |  |  |  |
| DIV.D       | F10,F0,F6 | V                  |         |              |  |  |  |  |  |
| ADD.D       | F6,F8,F2  | V                  |         |              |  |  |  |  |  |

| Name  | Reservation stations |      |    |                    |       |       |               |  |  |  |  |
|-------|----------------------|------|----|--------------------|-------|-------|---------------|--|--|--|--|
|       | Busy                 | Ор   | Vj | Vk                 | Qj    | Qk    | Α             |  |  |  |  |
| Load1 | No                   |      |    |                    |       |       |               |  |  |  |  |
| Load2 | Yes                  | Load |    |                    |       |       | 44 + Regs[R3] |  |  |  |  |
| Add1  | Yes                  | SUB  |    | Mem[32 + Regs[R2]] | Load2 |       |               |  |  |  |  |
| Add2  | Yes                  | ADD  |    |                    | Add1  | Load2 |               |  |  |  |  |
| Add3  | No                   |      |    |                    |       |       |               |  |  |  |  |
| Mult1 | Yes                  | MUL  |    | Regs[F4]           | Load2 |       |               |  |  |  |  |
| Mult2 | Yes                  | DIV  |    | Mem[32 + Regs[R2]] | Mult1 |       |               |  |  |  |  |

|       | Register status |       |    |      |      |       |     |  |     |  |  |  |
|-------|-----------------|-------|----|------|------|-------|-----|--|-----|--|--|--|
| Field | F0              | F2    | F4 | F6   | F8   | F10   | F12 |  | F30 |  |  |  |
| Qi    | Mult1           | Load2 |    | Add2 | Add1 | Mult2 |     |  |     |  |  |  |

# **Dealing with WAR**

| nstruction status |  |
|-------------------|--|
|-------------------|--|

| Instruct | ion       |  |
|----------|-----------|--|
| L.D      | F6,32(R2) |  |
| L.D      | F2,44(R3) |  |
| MUL.D    | F0,F2,F4  |  |
| SUB.D    | F8,F2,F6  |  |
| DIV.D    | F10,F0,F6 |  |
| ADD.D    | F6,F8,F2  |  |
|          |           |  |

The processor issues both DIV and ADD although there is a WAR hazard.

If F6 is ready when DIV is issued, its value is read and stored in the RS (ADD may change it that is O.K.)

| Name  | Busy | Op   |
|-------|------|------|
| Load1 | No   |      |
| Load2 | Yes  | Load |
| Add1  | Yes  | SUB  |
| Add2  | Yes  | ADD  |
| Add3  | No   |      |
| Mult1 | Yes  | MUL  |

Yes

DIV

Mult2

If not ready, RS will read it from the FU producing it, again ADD may change F6 since we will read it from the FU not F6

Load2

Mult1

|       |       | Register status |    |      |      |       |     |  |     |  |  |  |  |
|-------|-------|-----------------|----|------|------|-------|-----|--|-----|--|--|--|--|
| Field | F0    | F2              | F4 | F6   | F8   | F10   | F12 |  | F30 |  |  |  |  |
| Qi    | Mult1 | Load2           |    | Add2 | Add1 | Mult2 |     |  |     |  |  |  |  |

Regs [F4]

Mem[32 + Regs[R2]]

#### Instruction stream



### Register result status:



Clock cycle counter





Clock F0 F2 F4 F6 F8 F10 F12 ... F30
2 FU Load2 Load1

## Note: Can have multiple loads outstanding





- Note: registers names are removed ("renamed") in Reservation Stations; MULT issued
- Load1 completing; who is waiting for Load1?





Load2 completing; what is waiting for Load2?



### • Timer starts down for Add1, Mult1



```
Instruction status:
                                 Exec
                                        Write
   Instruction
                       k
                           Issue
                                 Comp Result
                                                          Busy Address
   LD
            F6
                 34 +
                       R2
                                    3
                                          4
                                                    Load1
                                                            No
   LD
            F2
                 45 +
                       R3
                                    4
                                          5
                                                    Load2
                                                            No
   MULTD
            F0
                 F2
                       F4
                                                    Load3
                                                            No
   SUBD
            F8
                 F6
                       F2
   DIVD
           F10
                 F0
                       F6
                       F2
                 F8
   ADDD
            F6
                             6
Reservation Stations:
                                   SI
                                         S2
                                               RS
                                                      RS
                                   Vi
                                         Vk
          Time Name Busy Op
                                               Q_{j}
                                                      Qk
                      V_{es} SURD M(A1) M(A2)
              1 Add1
                Add2
                      Yes ADDD
                                        M(A2) Add1
                Add3
                      No
                      Yes MULTD M(A2) R(F4)
              9 Mult1
                Mult2
                          DIVD
                                        M(A1) Mult1
                      Yes
Register result status:
                                                           F10
   Clock
                            F0
                                   F2
                                         F4
                                               F6
                                                     F8
                                                                  F12
                                                                              F30
                      FU
                           Mult1 M(A2)
                                               Add2
                                                     Add1
      6
                                                           Mult2
```

### Issue ADDD here despite name dependency on F6?



### Add1 (SUBD) completing; what is waiting for it?



```
Instruction status:
                                  Exec Write
                           Issue Comp Result
                                                           Busy Address
   Instruction
                       k
  LD
            F6
                 34 +
                       R2
                                          4
                                                             No
                                                     Load1
                              1
   LD
            F2
                 45+
                       R3
                                    4
                                          5
                                                     Load2
                                                             No
   MULTD
            F0
                 F2
                       F4
                                                     Load3
                                                            No
   SUBD
            F8
                 F6
                              4
                                    7
                                          8
   DIVD
           F10
                 F0
                              5
                       F6
   ADDD
            F6
                 F8
                       F2
                              6
Reservation Stations:
                                   S1
                                          S2
                                                RS
                                                      RS
                                          Vk
                                                Q_i
           Time Name Busy
                            Op
                                                      Ok
                Add1
                       No
                      Yes ADDD (M-M)
              2 Add2
                                        M(A2)
                Add3
                       No
                      Yes MULTD M(A2) R(F4)
              7 Mult1
                Mult2
                      Yes DIVD
                                        M(A1) Mult1
```

 Clock
 F0
 F2
 F4
 F6
 F8
 F10
 F12
 ...
 F30

 8
 FU
 Mult1
 M(A2)
 Add2
 Mult2
 Mult2
 ...
 F30

```
Instruction status:
                                   Exec
                                         Write
                             Issue Comp Result
                                                              Busy Address
   Instruction
                        k
   LD
            F6
                  34 +
                                      3
                                            4
                                                                No
                        R2
                                                       Load1
                               1
   LD
            F2
                  45 +
                        R3
                                                       Load2
                                                               No
                                      4
                                            5
   MULTD
            FO
                  F2
                        F4
                                                       Load3
                                                               No
                               4
                                      7
   SUBD
            F8
                  F6
                        F2
                                            8
   DIVD
            F10
                  F<sub>0</sub>
                        F<sub>6</sub>
                               5
   ADDD
            F6
                  F8
                        F2
                               6
Reservation Stations:
                                           S2
                                     SI
                                                  RS
                                                         RS
           Time Name Busy
                                     V_j
                                            Vk
                                                  Q_j
                                                         Ok
                              Op
                Add1
                        No
               1 Add2
                       Yes ADDD (M-M) M(A2)
                Add3
                       No
               6 Mult1
                       Yes MULTDM(A2) R(F4)
                Mult2
                       Yes DIVD
                                          M(A1) Mult1
```

Clock F0 F2 F4 F6 F8 F10 F12 ... F30

9 FU Mult1 M(A2) Add2 (M-M) Mult2 ... F30

| In             | structio    | n sta | tus:   |            |       | Exec  | Write      |       |       |      |         |
|----------------|-------------|-------|--------|------------|-------|-------|------------|-------|-------|------|---------|
|                | Instruction | on    | j      | k          | Issue | Comp  | Result     |       |       | Busy | Address |
|                | LD          | F6    | 34+    | R2         | 1     | 3     | 4          |       | Load1 | No   |         |
|                | LD          | F2    | 45+    | <b>R</b> 3 | 2     | 4     | 5          |       | Load2 | No   |         |
|                | MULTD       | FO    | F2     | F4         | 3     |       |            |       | Load3 | No   |         |
|                | SUBD        | F8    | F6     | F2         | 4     | 7     | 8          |       |       |      |         |
|                | DIVD        | F10   | FO     | F6         | 5     |       |            |       |       |      |         |
|                | ADDD        | F6    | F8     | F2         | 6     | 10    |            |       |       |      |         |
| $R_{\epsilon}$ | eservati    | on St | ations | s:         |       | S1    | <i>S</i> 2 | RS    | RS    |      |         |
|                |             | Time  | Name   | Busy       | Op    | Vj    | Vk         | Qj    | Qk    |      |         |
|                |             |       | Add1   | No         |       |       |            |       |       |      |         |
|                |             | 0     | Add2   | Yes        | ADDD  | (M-M) | M(A2)      |       |       |      |         |
|                |             |       | Add3   | No         |       |       |            |       |       |      |         |
|                |             | 5     | Mult1  | Yes        | MULTD | M(A2) | R(F4)      |       |       |      |         |
|                |             |       | Mult2  | Yes        | DIVD  |       | M(A1)      | Mult1 |       | ]    |         |

## Add2 (ADDD) completing; what is waiting for it?



| Instructio | n sta | tus:      |      |       | Exec  | Write      |       |       |      |         |
|------------|-------|-----------|------|-------|-------|------------|-------|-------|------|---------|
| Instructi  | on    | $\dot{j}$ | k    | Issue | Comp  | Result     |       |       | Busy | Address |
| LD         | F6    | 34+       | R2   | 1     | 3     | 4          |       | Load1 | No   |         |
| LD         | F2    | 45+       | R3   | 2     | 4     | 5          |       | Load2 | No   |         |
| MULTD      | F0    | F2        | F4   | 3     |       |            |       | Load3 | No   |         |
| SUBD       | F8    | F6        | F2   | 4     | 7     | 8          |       |       |      |         |
| DIVD       | F10   | FO        | F6   | 5     |       |            |       |       |      |         |
| ADDD       | F6    | F8        | F2   | 6     | 10    | 11         |       |       |      |         |
| Reservati  | on St | ations    | s:   |       | S1    | <i>S</i> 2 | RS    | RS    |      |         |
|            | Time  | Name      | Busy | Op    | Vj    | Vk         | Qj    | Qk    | _    |         |
|            |       | Add1      | No   |       |       |            |       |       |      |         |
|            |       | Add2      | No   |       |       |            |       |       |      |         |
|            |       | Add3      | No   |       |       |            |       |       |      |         |
|            | 4     | Mult1     | Yes  | MULTI | M(A2) | R(F4)      |       |       |      |         |
|            |       | Mult2     | Yes  | DIVD  |       | M(A1)      | Mult1 |       | ]    |         |

- Write result of ADDD here?
- All quick instructions completes in this cycle!



```
Instruction status:
                                    Exec Write
                             Issue Comp Result
                                                               Busy Address
   Instruction
                         k
   LD
            F6
                  34 +
                                      3
                                             4
                                                                 No
                        R2
                                1
                                                        Load1
  LD
            F2
                  45 +
                        R3
                                2
                                                        Load2
                                                                No
                                      4
                                             5
   MULTD
            FO
                  F2
                        F4
                                                        Load3
                                                                No
                               4
                                      7
   SUBD
            F8
                  F6
                        F2
                                             8
   DIVD
            F10
                  F<sub>0</sub>
                        F<sub>6</sub>
   ADDD
            F6
                  F8
                        F2
                               6
                                      10
                                             11
Reservation Stations:
                                            S2
                                      SI
                                                   RS
                                                          RS
           Time Name Busy
                              Op
                                      V_j
                                            Vk
                                                   Q_j
                                                          Ok
                 Add1
                        No
                 Add2
                        No
                 Add3
                        No
               3 Mult1
                        Yes MULTDM(A2) R(F4)
                 Mult2
                        Yes DIVD
                                           M(A1) Mult1
```

Clock F0 F2 F4 F6 F8 F10 F12 ... F30 12 FU Mult1 M(A2) (M-M+M (M-M) Mult2 ... F30

```
Instruction status:
                                    Exec Write
                             Issue Comp Result
                                                               Busy Address
   Instruction
                         k
   LD
            F6
                  34 +
                                      3
                                             4
                                                                 No
                        R2
                                1
                                                        Load1
   LD
            F2
                  45 +
                        R3
                                2
                                                        Load2
                                                                No
                                      4
                                             5
   MULTD
            FO
                  F2
                        F4
                                                        Load3
                                                                No
                  F6
                               4
                                      7
   SUBD
            F8
                        F2
                                             8
   DIVD
            F10
                  F<sub>0</sub>
                        F<sub>6</sub>
   ADDD
            F6
                  F8
                        F2
                               6
                                      10
                                             11
Reservation Stations:
                                            S2
                                      SI
                                                   RS
                                                          RS
           Time Name Busy
                              Op
                                      V_j
                                            Vk
                                                   Q_j
                                                          Ok
                 Add1
                        No
                 Add2
                        No
                 Add3
                        No
               2 Mult1
                        Yes MULTDM(A2) R(F4)
                 Mult2
                        Yes DIVD
                                           M(A1) Mult1
```

Clock F0 F2 F4 F6 F8 F10 F12 ... F30 13 FU Mult1 M(A2) (M-M+N (M-M) Mult2 ... F30

```
Instruction status:
                                    Exec Write
                             Issue Comp Result
                                                               Busy Address
   Instruction
                         k
   LD
            F6
                  34 +
                                      3
                                             4
                                                                 No
                        R2
                                1
                                                        Load1
   LD
            F2
                  45 +
                        R3
                                2
                                                        Load2
                                                                No
                                      4
                                             5
   MULTD
            FO
                  F2
                        F4
                                                        Load3
                                                                No
                  F6
                               4
                                      7
   SUBD
            F8
                        F2
                                             8
   DIVD
            F10
                  F<sub>0</sub>
                        F<sub>6</sub>
   ADDD
            F6
                  F8
                        F2
                               6
                                      10
                                             11
Reservation Stations:
                                            S2
                                      SI
                                                   RS
                                                          RS
           Time Name Busy
                              Op
                                      V_j
                                            Vk
                                                   Q_j
                                                          Ok
                 Add1
                        No
                 Add2
                        No
                 Add3
                        No
               1 Mult1
                        Yes MULTDM(A2) R(F4)
                 Mult2
                        Yes DIVD
                                           M(A1) Mult1
```

| Ins | tructio    | n sta | tus:      |                  |       | Exec  | Write         |       |       |      |         |
|-----|------------|-------|-----------|------------------|-------|-------|---------------|-------|-------|------|---------|
| ]   | Instructio | on    | $\dot{J}$ | $\boldsymbol{k}$ | Issue | Comp  | Result        |       |       | Busy | Address |
| ]   | LD         | F6    | 34+       | R2               | 1     | 3     | 4             |       | Load1 | No   |         |
| ]   | LD         | F2    | 45+       | <b>R</b> 3       | 2     | 4     | 5             |       | Load2 | No   |         |
| ]   | MULTD      | FO    | F2        | F4               | 3     | 15    |               |       | Load3 | No   |         |
|     | SUBD       | F8    | F6        | F2               | 4     | 7     | 8             |       |       |      |         |
| ]   | DIVD       | F10   | FO        | F6               | 5     |       |               |       |       |      |         |
| 1   | ADDD       | F6    | F8        | F2               | 6     | 10    | 11            |       |       |      |         |
| Res | servatio   | on St | ations    | s:               |       | S1    | <i>S</i> 2    | RS    | RS    |      |         |
|     |            | Time  | Name      | Busy             | Op    | Vj    | Vk            | Qj    | Qk    |      |         |
|     |            |       | Add1      | No               |       |       |               |       |       |      |         |
|     |            |       | Add2      | No               |       |       |               |       |       |      |         |
|     |            |       | Add3      | No               |       |       |               |       |       |      |         |
|     |            | 0     | Mult1     | Yes              | MULTI | M(A2) | <b>R</b> (F4) |       |       |      |         |
|     |            |       | Mult2     | Yes              | DIVD  |       | M(A1)         | Mult1 |       | ]    |         |

Clock F0 F2 F4 F6 F8 F10 F12 ... F30 15 FU Mult 1 M(A2) (M-M+N(M-M)) Mult 2

## • Mult1 (MULTD) completing; who is waiting for it?



```
Instruction status:
                                   Exec
                                         Write
   Instruction
                        k
                            Issue
                                   Comp Result
                                                            Busy Address
   LD
            F6
                 34 +
                       R2
                                     3
                                           4
                                                      Load1
                                                              No
   LD
            F2
                 45 +
                       R3
                                     4
                                           5
                                                      Load2
                                                              No
                              2
   MULTD
                       F4
                              3
                                     15
                                                      Load3
            F0
                  F2
                                           16
                                                              No
                                           8
   SUBD
            F8
                  F6
                        F2
                              4
                                     7
   DIVD
            F10
                  F0
                              5
                        F6
   ADDD
                  F8
            F6
                        F2
                              6
                                     10
                                           11
Reservation Stations:
                                    SI
                                           S2
                                                 RS
                                                       RS
                                     V_{j}
                                           Vk
                                                 Q_j
                                                       Qk
           Time Name Busy
                             Op
                Add1
                       No
                Add2
                       No
                Add3
                       No
                Mult1
                       No
             40 Mult2
                            DIVD
                                   M*F4 M(A1)
                      Yes
Register result status:
                                    F2
                                                       F8
                                                             F10
                                                                    F12
   Clock
                             F0
                                          F4
                                                 F6
                                                                                 F30
```

## Just waiting for Mult2 (DIVD) to complete

M\*F4

FU

M(A2)



**16** 

(M-M+N(M-M) Mult2





```
Instruction status:
                                    Exec Write
                              Issue Comp Result
                                                               Busy Address
   Instruction
                         k
   LD
             F6
                  34 +
                                       3
                                             4
                                                                 No
                        R2
                                1
                                                        Load1
   LD
            F2
                  45 +
                        R3
                                                        Load2
                                                                 No
                                       4
   MULTD
            FO
                  F2
                        F4
                                3
                                      15
                                             16
                                                        Load3
                                                                 No
                  F6
                                4
   SUBD
            F8
                        F2
   DIVD
            F10
                  F<sub>0</sub>
                        F<sub>6</sub>
   ADDD
            F6
                   F8
                         F2
                                6
                                      10
                                             11
Reservation Stations:
                                            S2
                                      SI
                                                   RS
                                                          RS
           Time Name Busy
                              Op
                                      V_j
                                             Vk
                                                   Q_j
                                                          Qk
                 Add1
                        No
                 Add2
                        No
                 Add3
                        No
                 Mult1
                        No
               1 Mult2
                             DIVD M*F4 M(A1)
                        Yes
```

Clock F0 F2 F4 F6 F8 F10 F12 ... F30
55 FU M\*F4 M(A2) (M-M+N (M-M) Mult2

```
Instruction status:
                                   Exec Write
                                                             Busy Address
   Instruction
                        k
                            Issue
                                   Comp Result
   LD
            F6
                 34+
                        R2
                                     3
                                            4
                                                      Load1
                                                              No
   LD
            F2
                 45 +
                        R3
                                     4
                                            5
                                                      Load2
                                                              No
                               3
   MULTD
            F0
                        F4
                                           16
                                                      Load3
                  F2
                                     15
                                                              No
                                            8
   SUBD
            F8
                  F6
                        F2
                               4
   DIVD
            F10
                  F0
                               5
                                     56
                        F6
   ADDD
                  F8
                        F2
                               6
                                     10
                                           11
            F6
Reservation Stations:
                                    S1
                                           S2
                                                 RS
                                                        RS
                                     V_{j}
                                           Vk
                                                  Q_j
                                                        Qk
           Time Name Busy
                             Op
                Add1
                        No
                Add2
                        No
                Add3
                        No
                Mult1
                        No
               0 Mult2
                                   M*F4 M(A1)
                       Yes
                            DIVD
```



 Once again: In-order issue, out-of-order execution and out-of-order completion.

*F*2

M\*F4 M(A2)

F0

FU

Clock

56

F4

*F6* 

F10

F12

F8

(M-M+N (M-M) Result

F30

- Load and stores could be done out of order provided they access different memory locations.
- If they access same location, must preserve order (WAR, RAW, or WAW).
- If address calculation is done in program order, load/store can check if any uncompleted load/store share the same address
- Either wait or forward if possible.



## **Hardware-Based Speculation**

- Execute instructions along predicted execution paths but only commit the results if prediction was correct
- Instruction commit: allowing an instruction to update the register file when instruction is no longer speculative
- Need an additional piece of hardware to prevent any irrevocable action until an instruction commits
- Need to separate executing the instruction to pass data to other instructions from completing (performing operations that can not be undone)



- Register values and memory values are not written until an instruction commits
- On misprediction:
  - Speculated entries in ROB are cleared
- Exceptions:
  - Not recognized until it is ready to commit



- Reorder buffer holds the result of instruction between completion and commit (and supply them to any instruction who needs them just like the RS in Tomasulo's)
- Four fields:
  - Instruction type: branch/store/register
  - Destination field: register number or memory address
  - Value field: output value
  - Ready field: completed execution?
- Modify reservation stations:
  - Operand source is now reorder buffer instead of functional unit (results are tagged with ROB entry #)



- Register values and memory values are not written until an instruction commits
- On misprediction:
  - Speculated entries in ROB are cleared
- Exceptions:
  - Not recognized until it is ready to commit
- 4 stages
  - Issue
  - Execute
  - Write Result
  - Commit



#### Issue

- If empty RS and ROB entry → Issue; else stall
- Send operands to RS if available in registers or ROB
- The number of the ROB entry allocated to instruction is sent to RS to tag the results with
- If operands are not available yet, the ROB entry is sent to the RS to wait for results on the CDB



#### Execute

- If one or more operands are not available, monitor the CDB.
- When the result is broadcast on the CDB (we know that from the ROB entry tag) copy it
- When all operands are ready, start execution

#### Write Result

- When execution is completed, broadcast the result on the CDB tagged with ROB entry #
- Results are copied to ROB entry and all waiting RS
- Execute out of order, commit in order.



- When an instruction reaches the head of the ROB and the result is ready in the buffer,
  - If ALU op write it to the register file and remove instruction from ROB
  - If the instruction is a store, write it to the memory and remove the instruction from the ROB
  - If the instruction is a branch, if prediction is correct, remove it from the ROB. If misprediction flush the ROB and start from the correct successor.



# **Overview of Design**





### Multiple Issue and Static Scheduling

 To achieve CPI < 1, need to complete multiple instructions per clock

#### Solutions:

- Statically scheduled superscalar processors
- VLIW (very long instruction word) processors
- dynamically scheduled superscalar processors



## **Multiple Issue**

| Common name                  | Issue<br>structure | Hazard<br>detection   | Scheduling               | Distinguishing characteristic                                             | Examples                                                                             |
|------------------------------|--------------------|-----------------------|--------------------------|---------------------------------------------------------------------------|--------------------------------------------------------------------------------------|
| Superscalar<br>(static)      | Dynamic            | Hardware              | Static                   | In-order execution                                                        | Mostly in the<br>embedded space:<br>MIPS and ARM,<br>including the ARM<br>Coretex A8 |
| Superscalar<br>(dynamic)     | Dynamic            | Hardware              | Dynamic                  | Some out-of-order execution, but no speculation                           | None at the present                                                                  |
| Superscalar<br>(speculative) | Dynamic            | Hardware              | Dynamic with speculation | Out-of-order execution with speculation                                   | Intel Core i3, i5, i7;<br>AMD Phenom; IBM<br>Power 7                                 |
| VLIW/LIW                     | Static             | Primarily software    | Static                   | All hazards determined<br>and indicated by compiler<br>(often implicitly) | Most examples are in signal processing, such as the TI C6x                           |
| EPIC                         | Primarily static   | Primarily<br>software | Mostly static            | All hazards determined and indicated explicitly by the compiler           | Itanium                                                                              |



- Package multiple operations into one instruction
- Example VLIW processor:
  - One integer instruction (or branch)
  - Two independent floating-point operations
  - Two independent memory references
- Must be enough parallelism in code to fill the available slots



- Disadvantages:
  - Statically finding parallelism
  - Code size
  - No hazard detection hardware
  - Binary code compatibility

- Package multiple operations into one instruction
- Example VLIW processor:
  - One integer instruction (or branch)
  - Two independent floating-point operations
  - Two independent memory references
- Must be enough parallelism in the code to fill the available slots



- Disadvantages:
  - Statically finding parallelism
  - Code size
  - No hazard detection hardware
  - Binary code compatibility

# **VLIW Example**

| Source instruction | Instruction using result | Latency |
|--------------------|--------------------------|---------|
| FP ALU OP          | FP ALU OP                | 3       |
| FP ALU OP          | Store double             | 2       |
| Load double        | FP ALU OP                | 1       |
| Load Double        | Store double             | 0       |

|                      | F0,0(R1)  | L.D    | Loop: |
|----------------------|-----------|--------|-------|
| For (I=1000;I>0;I++) | F4,F0,F2  | ADD.D  |       |
| ,                    | 0(R1),F4  | S.D    |       |
| x[l]=x[l]+s;         | R1,R1,#-8 | DADDUI |       |
|                      | 1,R2,Loop | BNE R  |       |

# **VLIW Example**

 Assume that w can schedule 2 memory operations, 2 FP operations, and one integer or branch

| Memory reference 1       | <i>Memory</i> reference 2  | FP<br>2 operation 1 | FP<br>op. 2   | Int. op/<br>branch | Clock  |
|--------------------------|----------------------------|---------------------|---------------|--------------------|--------|
| LD F <sub>0.0</sub> (R1) | LD F6,-8(R1)               |                     |               |                    | 1      |
| LD F10,-16(R1)           | LD <del>F14, 2</del> 4(R1) |                     |               |                    | 2      |
| LD F18,-32(R1)           | LD F22,-40(R1)             | ADDD F4,F0,F2       | ADDD F8,F6,F2 | 2                  | 3      |
| LD F26,-48(R1)           |                            | ADDD F12,F10,F2     | ADDD F16,F14  | ,F2                | 4      |
|                          |                            | ADDD F20,F18,F2     | ADDD F24,F22  | ,F2                | 5      |
| SD 0(R1),F4              | SD -8(R1),F8               | ADDD F28,F26,F2     |               |                    | 6      |
| SD -16(R1),F12           | SD -24(R1),F16             |                     |               | DADD R1,R1,#       | ±-56 7 |
| SD 24(R1),F20            | SD 16(R1),F24              |                     |               |                    | 8      |
| SD 8(R1),F28             |                            |                     |               | BNEZ R1,LOC        | OP 9   |



#### Dynamic Scheduling, Multiple Issue, and Speculation

- Modern microarchitectures:
  - Dynamic scheduling + multiple issue + speculation
- Two approaches:
  - Assign reservation stations and update pipeline control table in half clock cycles
    - Only supports 2 instructions/clock
  - Design logic to handle any possible dependencies between the instructions
  - Hybrid approaches
- Issue logic can become bottleneck



# **Multiple Issue**

- Limit the number of instructions of a given class that can be issued in a "bundle"
  - I.e. on FP, one integer, one load, one store
- Examine all the dependencies amoung the instructions in the bundle
- If dependencies exist in bundle, encode them in reservation stations
- Also need multiple completion/commit



## **Example**

Loop: LD R2,0(R1) ;R2=array element

DADDIU R2,R2,#1 ;increment R2

SD R2,0(R1) ;store result

DADDIU R1,R1,#8 ;increment pointer

BNE R2,R3,LOOP ; branch if not last element



# **Example (No Speculation)**

| Iteration<br>number | Instruct | tions      | Issues at<br>clock cycle<br>number | Executes at<br>clock cycle<br>number | Memory<br>access at<br>clock cycle<br>number | Write CDB at<br>clock cycle<br>number | Comment          |
|---------------------|----------|------------|------------------------------------|--------------------------------------|----------------------------------------------|---------------------------------------|------------------|
| 1                   | LD       | R2,0(R1)   | 1                                  | 2                                    | 3                                            | 4                                     | First issue      |
| 1                   | DADDIU   | R2,R2,#1   | 1                                  | 5 ←                                  |                                              | 6                                     | Wait for LW      |
| 1                   | SD       | R2,0(R1)   | 2                                  | 3                                    | 7 ←                                          |                                       | Wait for DADDIU  |
| 1                   | DADDIU   | R1,R1,#8   | 2                                  | 3                                    |                                              | 4                                     | Execute directly |
| 1                   | BNE      | R2,R3,LOOP | 3                                  | 7                                    |                                              |                                       | Wait for DADDIU  |
| 2                   | LD       | R2,0(R1)   | 4                                  | 8                                    | 9                                            | 10                                    | Wait for BNE     |
| 2                   | DADDIU   | R2,R2,#1   | 4                                  | 11                                   |                                              | 12                                    | Wait for LW      |
| 2                   | SD       | R2,0(R1)   | 5                                  | 9                                    | 13                                           |                                       | Wait for DADDIU  |
| 2                   | DADDIU   | R1,R1,#8   | 5                                  | 8                                    |                                              | 9                                     | Wait for BNE     |
| 2                   | BNE      | R2,R3,LOOP | 6                                  | 13                                   |                                              |                                       | Wait for DADDIU  |
| 3                   | LD       | R2,0(R1)   | 7                                  | 14                                   | 15                                           | 16                                    | Wait for BNE     |
| 3                   | DADDIU   | R2,R2,#1   | 7                                  | 17                                   |                                              | 18                                    | Wait for LW      |
| 3                   | SD       | R2,0(R1)   | 8                                  | 15                                   | 19                                           |                                       | Wait for DADDIU  |
| 3                   | DADDIU   | R1,R1,#8   | 8                                  | 14                                   |                                              | 15                                    | Wait for BNE     |
| 3                   | BNE      | R2,R3,L00P | 9                                  | 19                                   |                                              |                                       | Wait for DADDIU  |



# **Example**

| Iteration<br>number | Instruct | tions      | Issues<br>at clock<br>number | Executes<br>at clock<br>number | Read access<br>at clock<br>number | Write<br>CDB at<br>clock<br>number | Commits<br>at clock<br>number | Comment           |
|---------------------|----------|------------|------------------------------|--------------------------------|-----------------------------------|------------------------------------|-------------------------------|-------------------|
| 1                   | LD       | R2,0(R1)   | 1                            | 2                              | 3                                 | 4                                  | 5                             | First issue       |
| 1                   | DADDIU   | R2,R2,#1   | 1                            | 5 <del>←</del>                 |                                   | <u></u>                            | 7                             | Wait for LW       |
| 1                   | SD       | R2,0(R1)   | 2                            | 3                              |                                   |                                    | 7                             | Wait for DADDIU   |
| 1                   | DADDIU   | R1,R1,#8   | 2                            | 3                              |                                   | 4                                  | 8                             | Commit in order   |
| 1                   | BNE      | R2,R3,L00P | 3                            | 7 🗸                            |                                   |                                    | 8                             | Wait for DADDIU   |
| 2                   | LD       | R2,0(R1)   | 4                            | 5                              | 6                                 | 7                                  | 9                             | No execute delay  |
| 2                   | DADDIU   | R2,R2,#1   | 4                            | 8                              |                                   | 9                                  | 10                            | Wait for LW       |
| 2                   | SD       | R2,0(R1)   | 5                            | 6                              |                                   |                                    | 10                            | Wait for DADDIU   |
| 2                   | DADDIU   | R1,R1,#8   | 5                            | 6                              |                                   | 7                                  | 11                            | Commit in order   |
| 2                   | BNE      | R2,R3,L00P | 6                            | 10                             |                                   |                                    | 11                            | Wait for DADDIU   |
| 3                   | LD       | R2,0(R1)   | 7                            | 8                              | 9                                 | 10                                 | 12                            | Earliest possible |
| 3                   | DADDIU   | R2,R2,#1   | 7                            | 11                             |                                   | 12                                 | 13                            | Wait for LW       |
| 3                   | SD       | R2,0(R1)   | 8                            | 9                              |                                   |                                    | 13                            | Wait for DADDIU   |
| 3                   | DADDIU   | R1,R1,#8   | 8                            | 9                              |                                   | 10                                 | 14                            | Executes earlier  |
| 3                   | BNE      | R2,R3,L00P | 9                            | 13                             |                                   |                                    | 14                            | Wait for DADDIU   |



## **Branch-Target Buffer**

- Need high instruction bandwidth!
  - Branch-Target buffers
    - Next PC prediction buffer, indexed by current PC





