

















| Examp                                                                                 | les                                                                  |                                                                                                                                                                       | Introdu |
|---------------------------------------------------------------------------------------|----------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|
| Example 1     DADDU     BEQZ     DSUBU L:     OR                                      | <u>-</u><br>R1,R2,R3<br>R4,L<br>R1,R1,R6<br>R7,R1,R8                 | <ul> <li>OR instruction dependent<br/>on DADDU and DSUBU</li> <li>Preserving the order alone<br/>is not sufficient (must have<br/>the correct value in R1)</li> </ul> | laction |
| <ul> <li>Example 2<br/>DADDU<br/>BEQZ<br/>DSUBU<br/>DADDU<br/>skip:<br/>OR</li> </ul> | <u>-</u><br>R1,R2,R3<br>R12,skip<br>R4,R5,R6<br>R5,R4,R9<br>R7,R8,R9 | <ul> <li>Assume R4 isn't used after skip</li> <li>Possible to move DSUBU before the branch</li> </ul>                                                                 |         |
| M<                                                                                    | Copyright @                                                          | ∋ 2012, Elsevier Inc. All rights reserved.                                                                                                                            | 10      |

| •  | <ul> <li>ompiler Techn</li> <li>Pipeline scheduli</li> <li>Separate dependinstruction by the instruction</li> </ul> | iques for Ex<br>ing<br>dent instruction fro<br>pipeline latency | posing ILP<br>om the source<br>of the source | Compiler Techniques |
|----|---------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------|----------------------------------------------|---------------------|
| 1  | Example:<br>for (i=999; i>=0; i=i<br>x[i] = x[i] + s;                                                               | -1) No depen<br>between i<br>MIPS cod                           | dence<br>terations<br>e?                     |                     |
|    | Instruction producing result                                                                                        | Instruction using result                                        | Latency in clock cycles                      |                     |
|    | FP ALU op                                                                                                           | Another FP ALU op                                               | 3                                            |                     |
|    | FP ALU op                                                                                                           | Store double                                                    | 2                                            |                     |
|    | Load double                                                                                                         | FP ALU op                                                       | 1                                            |                     |
|    | Load double                                                                                                         | Store double                                                    | 0                                            |                     |
| MK | Copyright ©                                                                                                         | 2012, Elsevier Inc. All rights rese                             | rved.                                        | 11                  |

| P  | ipeline                                                                         | e Stal                                                           | s                                   |                                      | Compi          |
|----|---------------------------------------------------------------------------------|------------------------------------------------------------------|-------------------------------------|--------------------------------------|----------------|
| Lo | op: L.D<br>stall<br>ADD.D<br>stall<br>stall<br>S.D<br>DADDL<br>stall (as<br>BNE | F0,0(R1)<br>F4,F0,F2<br>F4,0(R1)<br>II R1,R1,#-8<br>sume integet | r load latency is 1)                | 1<br>2<br>3<br>4<br>5<br>6<br>7<br>8 | ler Techniques |
|    |                                                                                 | lucing result                                                    | Instruction using result            | Latency in clock cycles              |                |
|    | FP ALU op                                                                       |                                                                  | Another FP ALU op                   | 3                                    |                |
|    | FP ALU op                                                                       |                                                                  | Store double                        | 2                                    |                |
|    | Load double                                                                     |                                                                  | FP ALU op                           | 1                                    |                |
|    | Load double                                                                     |                                                                  | Store double                        | 0                                    |                |
|    |                                                                                 | Copyright ©                                                      | 2012, Elsevier Inc. All rights rese | rved.                                | 12             |

| <b>Pipeline Sch</b>          | eduling                             |                         |
|------------------------------|-------------------------------------|-------------------------|
| Sahadulad aada:              |                                     |                         |
| Scheduled Code.              |                                     | 1                       |
| DADDUI R1 R1 #-8             |                                     | 2                       |
| ADD.D F4.F0.F2               |                                     | 3                       |
| stall                        |                                     | 4                       |
| stall                        |                                     | 5                       |
| S.D F4,8(R1)                 |                                     | 6                       |
| BNE R1,R2,Loop               |                                     | 7                       |
| Instruction producing result | Instruction using result            | Latency in clock cycles |
| FP ALU op                    | Another FP ALU op                   | 3                       |
| FP ALU op                    | Store double                        | 2                       |
| Load double                  | FP ALU op                           | 1                       |
| Load double                  | Store double                        | 0                       |
| /                            |                                     |                         |
| Copyright ©                  | 2012, Elsevier Inc. All rights rese | erved.                  |



| <b>Loo</b> | <b>p Unr</b><br>beline so                                                                                     | olling/P                                                                                                                                                                                | Pipeline Scheduling<br>e unrolled loop: | Compiler Techn |
|------------|---------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------|----------------|
| Loop:      | L.D<br>L.D<br>L.D<br>ADD.D<br>ADD.D<br>ADD.D<br>ADD.D<br>S.D<br>S.D<br>S.D<br>S.D<br>S.D<br>S.D<br>S.D<br>BNE | F0,0(R1)<br>F6,-8(R1)<br>F10,-16(R1)<br>F14,-24(R1)<br>F4,F0,F2<br>F8,F6,F2<br>F12,F10,F2<br>F16,F14,F2<br>F4,0(R1)<br>F8,-8(R1)<br>R1,R1,#-32<br>F12,16(R1)<br>F16,8(R1)<br>R1,R2,Loop | Loop iterations are<br>independent      | iques          |
|            |                                                                                                               | Copyright © 20                                                                                                                                                                          | 112, Elsevier Inc. All rights reserved. | 15             |



























| S                                                            | oftw                                                                                               | are Pip                                                                                                                                                      | line                             | es                                                                 |                              |                                                                                                                                             |                                                                 |    |
|--------------------------------------------------------------|----------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------|--------------------------------------------------------------------|------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------|----|
|                                                              |                                                                                                    | Loop:                                                                                                                                                        | L.D<br>ADD<br>S.D<br>DADD<br>BNE | .D                                                                 | F0,(<br>F4,]<br>F4,(<br>R1,] | 0(R1)<br>F0,F2<br>0(R1)<br>R1,#-8                                                                                                           |                                                                 |    |
| Bef<br>1<br>2<br>3<br>4<br>5<br>6<br>7<br>8<br>9<br>10<br>11 | ADD.D<br>S.D<br>L.D<br>ADD.D<br>S.D<br>L.D<br>ADD.D<br>S.D<br>L.D<br>ADD.D<br>S.D<br>DADDUI<br>BNE | rolled 3 times<br>F0,0(R1)<br>F4,F0,F2<br>F4,0(R1)<br>F0,-8(R1)<br>F4,F0,F2<br>F4,-8(R1)<br>F0,-16(R1)<br>F4,F0,F2<br>F4,-16(R1)<br>R1,R1,#-24<br>R1,R2,LOOP | Afte<br>1<br>2<br>3<br>4<br>5    | ADD<br>L.D<br>S.D<br>L.D<br>L.D<br>DAD<br>BNE<br>S.D<br>ADD<br>S.D | .D<br>.D<br>DUI              | e Pipelined V<br>F0,0(R1)<br>F4,F0,F2<br>F0,-8(R1)<br>F4,F0,F2<br>F0,-16(R1)<br>R1,R1,#-8<br>R1,R2,LOO<br>F4,0(R1)<br>F4,F0,F2<br>F4,-8(R1) | ersion<br>;Stores M[i]<br>;Adds to M[i-1<br>);Loads M[i-2]<br>P | 1  |
| M<                                                           |                                                                                                    | Copyright © 2                                                                                                                                                | 2012, Else                       | evier Inc                                                          | . All rig                    | ghts reserved.                                                                                                                              |                                                                 | 29 |

















| E        | xan                  | nple                  |                                  |                                                           |                                  |       |    |
|----------|----------------------|-----------------------|----------------------------------|-----------------------------------------------------------|----------------------------------|-------|----|
| B1<br>B2 | if (a<br>d:<br>if (a | l==0)<br>=1;<br>l==1) | BNEZ<br>DADD<br>L1: DADD<br>BNEZ | R1, L1 ; d<br>R1, R0, #1 ; Y<br>R3, R1, #-1<br>R3, L2 ; b | l == 0 ?<br>/ES d==1<br>2 (bb!=2 | )     |    |
|          | {                    |                       | L2:                              | If b1 not tak                                             | (en h2i                          | s     |    |
|          |                      |                       |                                  | taken for su                                              | ure                              | 5     |    |
| Init     | ial d                | d==0?                 | B1                               | d befoe b2                                                | d==1                             | b2    |    |
| 0        |                      | Y                     | NO                               | 1                                                         | Y                                | NO    |    |
| 1        |                      | Ν                     | Taken                            | 1                                                         | Y                                | NO    |    |
| 2        |                      | Ν                     | Taken                            | 2                                                         | Ν                                | Taken |    |
|          |                      | Co                    | pyright © 2012, Elsevier I       | nc. All rights reserved.                                  |                                  |       | 38 |



| E>       | kam            | pl           | e            |                   |                         |              |              |                |
|----------|----------------|--------------|--------------|-------------------|-------------------------|--------------|--------------|----------------|
| Init     | ial d          | d==          | =0?          | B1                | d befo                  | oe b2        | d==          | :1 b2          |
| 0        |                | Ì            | Y            | NO                | 1                       |              | Y            | NO             |
| 1        |                | 1            | N            | Taken             | 1                       |              | Y            | NO             |
| 2        |                | 1            | N            | Taken             | 2                       |              | Ν            | Taken          |
| d        | b1<br>Pred     |              | b1<br>action | newb1<br>pred     | b2<br>pred              | k            | o2<br>action | new b2<br>pred |
| 2        | NT/N           | ۲ 🗙          | Т            | T/NT              | NT/ <mark>NT</mark>     | x            | Т            | NT/T           |
| 0        | T/NT           | $\checkmark$ | NT           | T/NT              | NT/T                    | $\checkmark$ | NT           | NT/T           |
| 2        | T/NT           | $\checkmark$ | Т            | T/NT              | NT/T                    | 1            | Т            | NT/T           |
| 0        | T /NT          | $\checkmark$ | NT           | T/NT              | NT/T                    | $\checkmark$ | NT           | NT/T           |
| Mi<br>on | spredic<br>lly | tion         | on first t   | ry                |                         |              |              |                |
| /<       |                |              | Copyrid      | aht © 2012. Elsev | vier Inc. All rights re | eserved.     |              |                |

























