# Static vs. Dynamic Scheduling

- Dynamic Scheduling
  - Fast
  - Requires complex hardware
  - More power consumption
  - May result in a slower clock
- Static Scheduling
  - Done in S/W (compiler)
  - Maybe not as fast
  - Simpler processor design (less complex)

N/<

Copyright © 2012, Elsevier Inc. All rights reserved.

61

## **Dynamic Scheduling**

- In Simple pipelines, instructions are issued in order.
- If an instruction stalls, all instructions after it are stalled too (could be O.K. to execute).
- DIV.D F0,F2,F4
- ADD.D F10,F0,F8
- SUB.D F12,F8,F14



M<

Copyright © 2012, Elsevier Inc. All rights reserved.

62

## **Dynamic Scheduling**

- Rearrange order of instructions to reduce stalls while maintaining data flow
- Advantages:
  - Compiler doesn't need to have knowledge of microarchitecture
  - Handles cases where dependencies are unknown at compile time
- Disadvantage:
  - Substantial increase in hardware complexity
  - Complicates exceptions

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

### **Dynamic Scheduling**

- Rearrange order of instructions to reduce stalls while maintaining data flow
- Instructions are issued in program order
- But, the instruction begins execution as soon as its operand are ready
- Out of order execution → out of order completion
- DIV.D F0.F2.F4
- ADD.D F6,F0,F8

Antidependence

■ SUB.D **F8**,F10,F14

Output Dependence

■ MUL.D F6,F10,F8

N/I<

Copyright © 2012, Elsevier Inc. All rights reserved.

64

### **Dynamic Scheduling**

- To allow dynamic scheduling, split the ID stage in the simple MIPS pipeline into 2 stages
  - Issue: Decode and check for structural hazards
  - Read operand: wait for data hazard → read operand
- Instruction fetch stage before issue, and execution starts after read operand.
- Instructions pass through the issue stage in order, they can be delayed or pass each other at the read operand stage.

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

65

### **Dynamic Scheduling**

- Major complication for exception handling.
- Must preserve the exception behavior as if the instructions are executed in the program order.
- May delay notification until the processor knows the instruction is the next one completed.
- Imprecise exception may occur
  - Later instructions (in program order) may have been completed already.
  - Earlier instructions may have not been completed

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

| _ | <br> |  | <br> |  |
|---|------|--|------|--|
| _ |      |  |      |  |
| _ |      |  |      |  |
| _ |      |  |      |  |
|   |      |  |      |  |
|   |      |  |      |  |
|   |      |  |      |  |
| _ |      |  |      |  |
|   |      |  |      |  |
|   |      |  |      |  |
| _ |      |  |      |  |
| _ |      |  |      |  |
| _ |      |  |      |  |
| _ |      |  |      |  |
| _ |      |  |      |  |
| _ | <br> |  | <br> |  |
|   | <br> |  | <br> |  |
|   |      |  |      |  |
|   |      |  |      |  |

# Register Renaming • Example: DIV.D F0,F2,F4 ADD.D F6,F0,F8 S.D F6,0(R1) SUB.D F8,F10,F14 MUL.D F6,F10,F8 + name dependence with F6

Copyright © 2012, Elsevier Inc. All rights reserved.

## **Register Renaming**

Example:

 DIV.D
 F0,F2,F4
 DIV.D
 F0,F2,F4

 ADD.D
 F6,F0,F8
 ADD.D
 S,F0,F8

 S.D
 F6,0(R1)
 S.D
 S,0(R1)

 SUB.D
 F8,F10,F14
 SUB.D
 T,F10,F14

 MUL.D
 F6,F10,F8
 MUL.D
 F6,F10,T

Now only RAW hazards remain, which can be strictly ordered

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

### **Register Renaming**

- Register renaming is provided by reservation stations (RS)
  - Contains:
    - The instruction
    - Buffered operand values (when available)
    - Reservation station number of instruction providing the operand values
  - RS fetches and buffers an operand as soon as it becomes available (not necessarily involving register file)
  - Pending instructions designate the RS to which they will send their output
    - Result values broadcast on a result bus, called the common data bus (CDB)
  - Only the last output updates the register file
  - As instructions are issued, the register specifiers are renamed with the reservation station
  - May be more reservation stations than registers

| ₩. |  |
|----|--|
|    |  |
|    |  |

Copyright © 2012, Elsevier Inc. All rights reserved.

| • |      |
|---|------|
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   | <br> |
|   | <br> |
|   |      |
|   |      |
|   |      |
|   | <br> |
|   | •    |
|   |      |
|   |      |
|   |      |
|   | <br> |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |
|   |      |

# Load and store buffers Contain data and addresses, act like reservation stations Top-level design: Top-level design:

# **Tomasulo's Algorithm**

■ Three Steps:

- Issue
  - Get next instruction from FIFO queue
  - If available RS, issue the instruction to the RS with operand values if available
  - . If no RS is available, stall the instruction issue
- Execute
  - When operand becomes available, store it in any reservation stations waiting for it
  - When all operands are ready, issue the instruction
  - Loads and store maintained in program order through effective address
  - No instruction allowed to initiate execution until all branches that proceed it in program order have completed
- Write result
  - Write result on CDB into reservation stations and store buffers
    - (Stores must wait until address and value are received)

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

71

# **Tomasulo's Algorithm**

Op: Operation to perform in the unit (e.g., + or -)

Vj, Vk: Value of Source operands

■ Store buffers has V field, result to be stored

Qj, Qk. Reservation stations producing source registers (value to be written)

- Note: Qj,Qk=0 → ready
- Store buffers only have Qj for RS producing result
- A: Used to hold info for the load store (initially immediate, then effective address)

Busy: Indicates reservation station or FU is busy
Register result status— Qi indicates which functional unit
will write each register, 0 means no write to this register

**M<** 

Copyright © 2012, Elsevier Inc. All rights reserved.

| • |  |  |  |
|---|--|--|--|
| • |  |  |  |
|   |  |  |  |
|   |  |  |  |
|   |  |  |  |
|   |  |  |  |
|   |  |  |  |
|   |  |  |  |
|   |  |  |  |
|   |  |  |  |
|   |  |  |  |
|   |  |  |  |
|   |  |  |  |

























```
nstruction status:
                                             Exec Write
                                                                      Busy Address
Load1 No
Load2 No
Load3 No
    Instruction j
LD F6 34+
                                     Issue Comp Result
                              R2
                                                       4
5
    LD F2 45+ R3
MULTD F0 F2 F4
SUBD F8 F6 F2
    DIVD F10 F0 F6
ADDD F6 F8 F2
Reservation Stations:
                                               SI S2 RS
             | Time | Name | Busy | Op | Vj | Vk | Add1 | No | O Add2 | Yes | ADD | (M-M) | M(A2) | Add3 | No | S | Mult1 | Yes | MULTD | M(A2) | R(F4) | Mult2 | Yes | DIVD | M(A1) |
Register result status:
                            F0 F2 F4 F6 F8 F10 F12
FU Mult1 M(A2) Add2 (M-M) Mult2
   Clock
10
                                                                                                    ... F30
 • Add2 (ADDD) completing; what is waiting for it?
M<
                                Copyright © 2012, Elsevier Inc. All rights reserved.
```







| Instruction                                      | n sta                                                | tus:   |      |       | Exec  | Write  |       |       |       |         |   |     |
|--------------------------------------------------|------------------------------------------------------|--------|------|-------|-------|--------|-------|-------|-------|---------|---|-----|
| Instructio                                       | n                                                    | i      | k    | Issue | Comp  | Result |       |       | Busy  | Address |   |     |
| LD                                               | F6                                                   | 34+    | R2   | 1     | 3     | 4      |       | Load1 | No    |         | ı |     |
| LD                                               | F2                                                   | 45+    | R3   | 2     | 4     | 5      |       | Load2 | No    |         |   |     |
| MULTD                                            | FO                                                   | F2     | F4   | 3     | 15    |        |       | Load3 | No    |         |   |     |
| SUBD                                             | F8                                                   | F6     | F2   | 4     | 7     | 8      |       |       |       |         |   |     |
| DIVD                                             | F10                                                  | F0     | F6   | 5     |       |        |       |       |       |         |   |     |
| ADDD                                             | F6                                                   | F8     | F2   | 6     | 10    | 11     |       |       |       |         |   |     |
| Reservatio                                       | n St                                                 | ation. | s:   |       | SI    | S2     | RS    | RS    |       |         |   |     |
|                                                  | Time                                                 | Name   | Busy | Op    | $V_j$ | Vk     | Qj    | Qk    |       |         |   |     |
| 1                                                |                                                      | Add1   | No   |       |       |        |       |       |       |         |   |     |
| l .                                              |                                                      | Add2   | No   |       |       |        |       |       |       |         |   |     |
| l                                                |                                                      | Add3   | No   |       |       |        |       |       |       |         |   |     |
| l                                                | (                                                    | Mult1  |      | MULTI | M(A2) |        |       |       |       |         |   |     |
| l                                                |                                                      | Mult2  | Yes  | DIVD  |       | M(A1)  | Mult1 |       | l     |         |   |     |
| Register r                                       | esuli                                                | statu  | s:   |       |       |        |       |       |       |         |   |     |
| Clock                                            |                                                      |        |      | FO    | F2    | F4     | F6    | F8    | F10   | F12     |   | F30 |
| 15                                               |                                                      |        | FU   | Mult1 | M(A2) | (1     | M-M+N | (M-M) | Mult2 |         |   |     |
| Mult1 (MULTD) completing; who is waiting for it? |                                                      |        |      |       |       |        |       |       |       |         |   |     |
| M<                                               | Copyright © 2012, Elsevier Inc. All rights reserved. |        |      |       |       |        |       |       |       |         |   |     |

```
Busy Address
Load1 No
Load2 No
Load3 No
  nstruction status:
  Exec Write
                            Issue Comp Result
                                          4
5
16
8
  SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Reservation Stations:
                                    S1 S2 RS RS
          Time Name Busy Op
Add1 No
Add2 No
Add3 No
Mult1 No
                                    V_j
                                          Vk
                                                Q_j
                                                       Qk
             40 Mult2 Yes
                            DIVD M*F4 M(A1)
Register result status:
                       F0 F2 F4 F6 F8 F10 F12
FU M*F4 M(A2) (M-M+N (M-M) Mult2
   Clock
16
                                                                                F30
• Just waiting for Mult2 (DIVD) to complete
M<
                     Copyright © 2012, Elsevier Inc. All rights reserved.
```

Copyright © 2012, Elsevier Inc. All rights reserved. 92





# **Tomasulo's Algorithm**

- Load and stores could be done out of order provided they access different memory locations.
- If they access same location, must preserve order (WAR, RAW, or WAW).
- If address calculation is done in program order, load/store can check if any uncompleted load/store share the same address
- Either wait or forward if possible.

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

### **Hardware-Based Speculation**

- Execute instructions along predicted execution paths but only commit the results if prediction was correct
- Instruction commit: allowing an instruction to update the register file when instruction is no longer speculative
- Need an additional piece of hardware to prevent any irrevocable action until an instruction commits
- Need to separate executing the instruction to pass data to other instructions from completing (performing operations that can not be undone)

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

97

### **Reorder Buffer**

- Register values and memory values are not written until an instruction commits
- On misprediction:
  - Speculated entries in ROB are cleared
- Exceptions:
  - Not recognized until it is ready to commit

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

98

### **Reorder Buffer**

- Reorder buffer holds the result of instruction between completion and commit (and supply them to any instruction who needs them just like the RS in Tomasulo's)
- Four fields:
  - Instruction type: branch/store/register
  - Destination field: register number or memory address
  - Value field: output value
  - Ready field: completed execution?
- Modify reservation stations:
  - Operand source is now reorder buffer instead of functional unit (results are tagged with ROB entry #)

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

### **Reorder Buffer**

- Register values and memory values are not written until an instruction commits
- On misprediction:
  - Speculated entries in ROB are cleared
- Exceptions:
  - Not recognized until it is ready to commit
- 4 stages
  - Issue
  - Execute
  - Write Result
  - Commit

N/<

Copyright © 2012, Elsevier Inc. All rights reserved.

100

### **Reorder Buffer**

- Issue
  - $\blacksquare$  If empty RS and ROB entry  $\Rightarrow$  Issue; else stall
  - Send operands to RS if available in registers or ROB
  - The number of the ROB entry allocated to instruction is sent to RS to tag the results with
  - If operands are not available yet, the ROB entry is sent to the RS to wait for results on the CDB

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

101

# **Reorder Buffer**

- Execute
  - If one or more operands are not available, monitor the CDB.
  - When the result is broadcast on the CDB (we know that from the ROB entry tag) copy it
  - When all operands are ready, start execution
- Write Result
  - When execution is completed, broadcast the result on the CDB tagged with ROB entry #
- Results are copied to ROB entry and all waiting RS
- Execute out of order, commit in order.

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

10:

| • |  |  |
|---|--|--|
|   |  |  |
| • |  |  |
|   |  |  |
| • |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
| • |  |  |
|   |  |  |
| • |  |  |
|   |  |  |

### **Reorder buffer**

- When an instruction reaches the head of the ROB and the result is ready in the buffer,
  - If ALU op write it to the register file and remove instruction from ROB
  - If the instruction is a store, write it to the memory and remove the instruction from the ROB
  - If the instruction is a branch, if prediction is correct, remove it from the ROB. If misprediction flush the ROB and start from the correct successor.

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

102



### **Multiple Issue and Static Scheduling**

- To achieve CPI < 1, need to complete multiple instructions per clock
- Solutions:
  - Statically scheduled superscalar processors
  - VLIW (very long instruction word) processors
  - dynamically scheduled superscalar processors

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

| Superscular Dynamic Hardware Static In-order execution Mostly in the curbedded of MIPS and A including the Information of the I | pace:<br>RM, |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | c ARM        |
| speculation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | present      |
| Superscalar Dynamic Hardware Dynamic with Out-of-order execution speculation with speculation AMD Phen Power 7                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |              |
| VLIW/LIW Static Primarily Static All hazards determined software and indicated by compiler signal proce (often implicitly) such as the                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | ssing.       |
| EPIC Primarily static Primarily Mostly static All hazards determined Itanium and indicated explicitly by the compiler                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |              |

### **VLIW Processors**

- Package multiple operations into one instruction
- Example VLIW processor:
  - One integer instruction (or branch)
  - Two independent floating-point operations
  - Two independent memory references
- Must be enough parallelism in code to fill the available slots

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

107

### **VLIW Processors**

- Disadvantages:
  - Statically finding parallelism
  - Code size
  - No hazard detection hardware
  - Binary code compatibility

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

### **VLIW Processors**

- Package multiple operations into one instruction
- Example VLIW processor:
  - One integer instruction (or branch)
  - Two independent floating-point operations
  - Two independent memory references
- Must be enough parallelism in the code to fill the available slots

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

100

### **VLIW Processors**

- Disadvantages:
  - Statically finding parallelism
  - Code size
  - No hazard detection hardware
  - Binary code compatibility

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

110

# **VLIW Example**

| <ul> <li>Source instruction</li> </ul> | Instruction using result | Latency |
|----------------------------------------|--------------------------|---------|
| <ul> <li>FP ALU OP</li> </ul>          | FP ALU OP                | 3       |
| <ul> <li>FP ALU OP</li> </ul>          | Store double             | 2       |
| <ul> <li>Load double</li> </ul>        | FP ALU OP                | 1       |
| <ul> <li>Load Double</li> </ul>        | Store double             | 0       |
| Loop: L.D                              | F0.0(R1)                 |         |

ADD.D F4,F0,F2 For (I=1000;I>0;I++)
S.D 0(R1),F4
DADDUI R1,R1,#-8 X[I]=x[I]+S;
BNE R 1,R2,Loop

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

| <br>            |             |
|-----------------|-------------|
| <br><del></del> | <del></del> |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |
|                 |             |

## **VLIW Example**

 Assume that w can schedule 2 memory operations, 2 FP operations, and one integer or branch

| Memory reference 1 | Memory reference 2 | FP<br>operation 1 | FP<br>op. 2    | Int. op/ 0<br>branch | Clock |
|--------------------|--------------------|-------------------|----------------|----------------------|-------|
| LD F0.0(R1)        | LD F6,-8(R1)       |                   |                |                      | 1     |
| LD F10,-16(R1)     | LD F14, 24(R1)     |                   |                |                      | 2     |
| LD F18,-32(R1)     | LD F22,-40(R1)     | ADDD F4,F0,F2     | ADDD F8,F6,F2  |                      | 3     |
| LD F26,-48(R1)     |                    | ADDD F12,F10,F2   | ADDD F16,F14,F | 2                    | 4     |
|                    |                    | ADDD F20,F18,F2   | ADDD F24,F22,F | 2                    | 5     |
| SD 0(R1),F4        | SD -8(R1),F8       | ADDD F28,F26,F2   |                |                      | 6     |
| SD -16(R1),F12     | SD -24(R1),F16     |                   |                | DADD R1,R1,#-5       | 56 7  |
| SD 24(R1),F20      | SD 16(R1),F24      |                   |                |                      | 8     |
| SD 8(R1),F28       |                    |                   |                | BNEZ R1,LOOF         | 9     |
|                    |                    |                   |                |                      |       |

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

112

### **Dynamic Scheduling, Multiple Issue, and Speculation**

- Modern microarchitectures:
  - Dynamic scheduling + multiple issue + speculation
- Two approaches:
  - Assign reservation stations and update pipeline control table in half clock cycles
    - Only supports 2 instructions/clock
  - Design logic to handle any possible dependencies between the instructions
  - Hybrid approaches
- Issue logic can become bottleneck

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

111

# **Multiple Issue**

- Limit the number of instructions of a given class that can be issued in a "bundle"
  - I.e. on FP, one integer, one load, one store
- Examine all the dependencies amoung the instructions in the bundle
- If dependencies exist in bundle, encode them in reservation stations
- Also need multiple completion/commit

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

| • |  |  |
|---|--|--|
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
| • |  |  |
|   |  |  |
| • |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |

# Example

Loop: LD R2,0(R1)

;R2=array element

DADDIU R2,R2,#1 ;increment R2

SD R2,0(R1)

;store result DADDIU R1,R1,#8 ;increment pointer

BNE R2,R3,LOOP ;branch if not last element

Copyright © 2012, Elsevier Inc. All rights reserved.

# **Example (No Speculation)**

| Iteration<br>number | Instruct | tions      | Issues at<br>clock cycle<br>number | Executes at<br>clock cycle<br>number | Memory<br>access at<br>clock cycle<br>number | Write CDB at<br>clock cycle<br>number | Comment          |
|---------------------|----------|------------|------------------------------------|--------------------------------------|----------------------------------------------|---------------------------------------|------------------|
| 1                   | LD       | R2,0(R1)   | - 1                                | 2                                    | 3                                            | 4                                     | First issue      |
| 1                   | DADDIU   | R2,R2,#1   | - 1                                | 5 ←                                  |                                              | 6                                     | Wait for LW      |
| 1                   | SD       | R2,0(R1)   | 2                                  | 3                                    | 7 <                                          |                                       | Wait for DADD10  |
| 1                   | DADDIU   | R1,R1,#8   | 2                                  | 3                                    |                                              | 4                                     | Execute directly |
| 1                   | BNE      | R2,R3,L00P | 3                                  | 7                                    |                                              |                                       | Wait for DADD10  |
| 2                   | LD       | R2,0(R1)   | 4                                  | 8                                    | 9                                            | 10                                    | Wait for BNE     |
| 2                   | DADDIU   | R2,R2,#1   | 4                                  | 11                                   |                                              | 12                                    | Wait for LW      |
| 2                   | SD       | R2,0(R1)   | 5                                  | 9                                    | 13                                           |                                       | Wait for DADD1U  |
| 2                   | DADDIU   | R1,R1,#8   | 5                                  | 8                                    |                                              | 9                                     | Wait for BNE     |
| 2                   | BNE      | R2,R3,L00P | 6                                  | 13                                   |                                              |                                       | Wait for DADD10  |
| 3                   | LD       | R2,0(R1)   | 7                                  | 14                                   | 15                                           | 16                                    | Wait for BNE     |
| 3                   | DADDIU   | R2,R2,#1   | 7                                  | 17                                   |                                              | 18                                    | Wait for LW      |
| 3                   | SD       | R2,0(R1)   | 8                                  | 15                                   | 19                                           |                                       | Wait for DADD10  |
| 3                   | DADDIU   | R1,R1,#8   | 8                                  | 14                                   |                                              | 15                                    | Wait for BNE     |
| 3                   | BNE      | R2,R3,L00P | 9                                  | 19                                   |                                              |                                       | Wait for DADD10  |

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

# **Example**

| Iteration<br>number | Instruct | tions      | Issues<br>at clock<br>number | Executes<br>at clock<br>number | Read access<br>at clock<br>number | Write<br>CDB at<br>clock<br>number | Commits<br>at clock<br>number | Comment           |
|---------------------|----------|------------|------------------------------|--------------------------------|-----------------------------------|------------------------------------|-------------------------------|-------------------|
| 1                   | LD       | R2,0(R1)   | 1                            | 2                              | 3                                 | - 4                                | 5                             | First issue       |
| 1                   | DADDIU   | R2,R2,#1   | 1                            | 5 4                            |                                   | 6                                  | 7                             | Wait for LW       |
| 1                   | SD       | R2,0(R1)   | 2                            | 3                              |                                   |                                    | 7                             | Wait for DADDIU   |
| 1                   | DADDIU   | R1,R1,#8   | 2                            | 3                              |                                   | 4                                  | 8                             | Commit in order   |
| 1                   | BNE      | R2,R3,L00P | 3                            | 7 6                            |                                   |                                    | 8                             | Wait for DADDIU   |
| 2                   | LD       | R2,0(R1)   | 4                            | 5                              | 6                                 | 7                                  | 9                             | No execute delay  |
| 2                   | DADDIU   | R2,R2,#1   | 4                            | 8                              |                                   | 9                                  | 10                            | Wait for LW       |
| 2                   | SD       | R2,0(R1)   | 5                            | 6                              |                                   |                                    | 10                            | Wait for DADDIU   |
| 2                   | DADDIU   | R1,R1,#8   | 5                            | 6                              |                                   | 7                                  | 11                            | Commit in order   |
| 2                   | BNE      | R2,R3,L00P | 6                            | 10                             |                                   |                                    | 11                            | Wait for DADDIU   |
| 3                   | LD       | R2,0(R1)   | 7                            | 8                              | 9                                 | 10                                 | 12                            | Earliest possible |
| 3                   | DADDIU   | R2,R2,#1   | 7                            | 11                             |                                   | 12                                 | 13                            | Wait for LW       |
| 3                   | SD       | R2,0(R1)   | 8                            | 9                              |                                   |                                    | 13                            | Wait for DADDIU   |
| 3                   | DADDIU   | R1,R1,#8   | 8                            | 9                              |                                   | 10                                 | 14                            | Executes earlier  |
| 3                   | BNE      | R2,R3,L00P | 9                            | 13                             |                                   |                                    | 14                            | Wait for DADDIU   |

Copyright © 2012, Elsevier Inc. All rights reserved.

