| Data D  | ependei                              | nce                                                         |    |
|---------|--------------------------------------|-------------------------------------------------------------|----|
| - Loop: | L.D<br>ADD.D<br>S.D<br>DADDUI<br>BNE | F0,0(R1)<br>F4,F0,F2<br>F4,0(R1)<br>R1,R1,#-8<br>R1,R2,Loop |    |
| M<      | Copyright © 2012,                    | Elsevier Inc. All rights reserved.                          | 13 |











| Pipelin                                 | e Stal                   | for(i=10                  | )00;i>0;i)<br>x[i]=x[i]+A                            |
|-----------------------------------------|--------------------------|---------------------------|------------------------------------------------------|
|                                         | F4,0(R1)<br>JI R1,R1,#-8 | xt instruction is branch) | 1<br>2<br>3<br>4<br>5<br>6<br>7<br>8<br>9<br>9<br>10 |
| Instruction pro                         | ducing result            | Instruction using result  | Latency in clock cycles                              |
| FP ALU op                               |                          | Another FP ALU op         | 3                                                    |
| FP ALU op<br>Load double<br>Load double |                          | Store double              | 2                                                    |
|                                         |                          | FP ALU op                 |                                                      |
|                                         |                          | Store double              | 0                                                    |



| Pipeline Sch                 | eduling                  |                         |
|------------------------------|--------------------------|-------------------------|
| Scheduled code:              |                          |                         |
| Loop: L.D F0,0(R1)           |                          | 1                       |
| DADDUI R1,R1,#-8             |                          | 2                       |
| ADD.D F4,F0,F2               |                          | 3                       |
| stall                        |                          | 4                       |
| stall                        |                          | 5                       |
| S.D F4,8(R1)                 | Now 6                    | 6                       |
| BNE R1,R2,Loop               |                          |                         |
| Instruction producing result | Instruction using result | Latency in clock cycles |
| FP ALU op                    | Another FP ALU op        | 3                       |
| FP ALU op                    | Store double             | 2                       |
| Load double                  | FP ALU op                | 1                       |
|                              | Store double             | 0                       |
| Load double                  |                          |                         |







|       |                                                                                                        |                                                               | peline Scheduling                    | Compiler Techniques |
|-------|--------------------------------------------------------------------------------------------------------|---------------------------------------------------------------|--------------------------------------|---------------------|
| Loop: | L.D<br>L.D<br>L.D<br>ADD.D<br>ADD.D<br>ADD.D<br>ADD.D<br>S.D<br>S.D<br>S.D<br>S.D<br>S.D<br>S.D<br>BNE | F8,F6,F2<br>F12,F10,F2<br>F16,F14,F2<br>F4,0(R1)<br>F8,-8(R1) | Loop iterations are<br>independent   | lues                |
| M<    |                                                                                                        | Copyright © 2012                                              | , Elsevier Inc. All rights reserved. | 19                  |







## Loop Level Parallelsim

- Loop-carried Dependence: A data dependence between different loop iterations (data produced in earlier iteration used in a later one).
- LLP analysis is important in software optimizations such as loop unrolling since it usually requires loop iterations to be independent.
- LLP analysis is normally done at the source code level or close to it since assembly language and target machine code generation introduces loop-carried name dependence in the registers used for addressing and incrementing.
- Instruction level parallelism (ILP) analysis, on the other hand, is usually done when instructions are generated by the compiler

Copyright © 2012, Elsevier Inc. All rights reserved.

M<























Copyright © 2012, Elsevier Inc. All rights reserved.

## **Dependence Analysis**

- Sometimes, *points-to* analysis might help.
- We might be able to answer *simpler* questions, or get some hints.

Copyright © 2012, Elsevier Inc. All rights reserved.

- Do 2 pointers point to the same list?
- Type information
- Information derived when the object was allocated
- Pointer assignments

M<



| S                                         | Software Piplines                                                 |                                                                                                                                                              |                                                                              |                                         |                                   |                                                              |
|-------------------------------------------|-------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------|-----------------------------------------|-----------------------------------|--------------------------------------------------------------|
|                                           |                                                                   | Loop:                                                                                                                                                        | L.D<br>ADD.D<br>S.D<br>DADDUI<br>BNE                                         | F4,1<br>F4,0                            | 0(R1)<br>F0,F2<br>0(R1)<br>R1,#-8 |                                                              |
| 1<br>2<br>3<br>4<br>5<br>6<br>7<br>8<br>9 | L.D<br>ADD.D<br>S.D<br>L.D<br>ADD.D<br>S.D<br>L.D<br>ADD.D<br>S.D | rolled 3 times<br>F0,0(R1)<br>F4,F0,F2<br>F4,0(R1)<br>F0,-8(R1)<br>F4,F0,F2<br>F4,-8(R1)<br>F0,-16(R1)<br>F4,F0,F2<br>F4,-16(R1)<br>R1,R1,#-24<br>R1,R2,LOOP | L.:<br>ADI<br>L.:<br>1 S.I<br>2 ADI<br>3 L.I<br>4 DAI<br>5 BNI<br>S.I<br>ADI | D<br>D.D<br>D<br>D.D<br>D.D<br>D<br>DUI | F4,F0,F2                          | )<br>;Stores M[i]<br>;Adds to M[i-1]<br>);Loads M[i-2]<br>PP |
| M<                                        |                                                                   | Copyright © 2                                                                                                                                                | 2012, Elsevier In                                                            | c. All rig                              | ghts reserved.                    | 33                                                           |









