York University -- EECS



# **EECS4201 Computer Architecture**

- Instructor
  - Mokhtar Aboelaze
  - Office LAS2026 Phone ext: 40607
- Research interests
  - Computer Architecture
  - Low power architecture
  - Embedded systems
  - FPGA (in embedded applications)

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

\_\_

## **EECS4201 Computer Architecture**

- Text
  - Computer Architecture: A Quantitative Approach Patterson & Hennessey 5<sup>th</sup> Ed.
- Class Meeting
  - Tuesdays, Thursdays 10:11:30 CB 120
- Office Hours
  - Tuesdays, Thursdays 1:00-3:00pm or by appointment

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

3

# **Grading EECS4201**

Grades are distributed as follows

HW/Assignments
Quizzes
Midterm
Paper review – groups of 2
10%
10%

■ Final 40%

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

# **Grading EECS5501**

Grades are distributed as follows

HW/AssignmentsQuizzesMidterm10%20%

■ Project 20%

■ Final 35%

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

5

# **Assumptions**

- EECS2021 or equivalent
  - Assembly language
  - RISC architecture
  - ALU architecture
  - Pipelining and hazards
  - Memory hierarchy and cache organization

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

#### **Computer Architecture**

- Why study computer architecture
- Hardware/Architecture
  - Design better, faster, cheaper computers that use as little energy as possible
- Software
  - Understand the architecture to squeeze as much performance for your code as possible

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

7

## **Computer Technology**

- Performance improvements:
  - Improvements in semiconductor technology
    - Feature size, clock speed
  - Improvements in computer architectures
    - Enabled by HLL compilers, UNIX
    - Lead to RISC architectures
  - Together have enabled:
    - Lightweight computers
    - Productivity-based managed/interpreted programming languages

M<

Copyright © 2012, Elsevier Inc. All rights reserved.



#### **Current Trends in Architecture**

- Cannot continue to leverage Instruction-Level parallelism (ILP)
  - Single processor performance improvement ended in 2003
- New models for performance:
  - Data-level parallelism (DLP)
  - Thread-level parallelism (TLP)
  - Request-level parallelism (RLP)
- These require explicit restructuring of the application

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

U

#### **Classes of Computers**

- Personal Mobile Device (PMD)
  - e.g. start phones, tablet computers
  - Emphasis on energy efficiency and real-time
- Desktop Computing
  - Emphasis on price-performance
- Servers
  - Emphasis on availability, scalability, throughput
- Clusters / Warehouse Scale Computers
  - Used for "Software as a Service (SaaS)"
  - Emphasis on availability and price-performance
  - Sub-class: Supercomputers, emphasis: floating-point performance and fast internal networks
- Embedded Computers
  - Emphasis: price

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

1

#### **Parallelism**

- Classes of parallelism in applications:
  - Data-Level Parallelism (DLP)
  - Task-Level Parallelism (TLP)
- Classes of architectural parallelism:
  - Instruction-Level Parallelism (ILP)
  - Vector architectures/Graphic Processor Units (GPUs)
  - Thread-Level Parallelism
  - Request-Level Parallelism

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

## Flynn's Taxonomy

- Single instruction stream, single data stream (SISD)
- Single instruction stream, multiple data streams (SIMD)
  - Vector architectures
  - Multimedia extensions
  - Graphics processor units
- Multiple instruction streams, single data stream (MISD)
  - No commercial implementation
- Multiple instruction streams, multiple data streams (MIMD)
  - Tightly-coupled MIMD
  - Loosely-coupled MIMD

Copyright © 2012, Elsevier Inc. All rights reserved.

## **Defining Computer Architecture**

- "Old" view of computer architecture:
  - Instruction Set Architecture (ISA) design
  - i.e. decisions regarding:
    - registers, memory addressing, addressing modes, instruction operands, available operations, control flow instructions, instruction encoding
- "Real" computer architecture:
  - Specific requirements of the target machine
  - Design to maximize performance within constraints: cost, power, and availability
  - Includes ISA, microarchitecture, hardware

Copyright © 2012, Elsevier Inc. All rights reserved.

#### Trends in Technology

Integrated circuit technology

Transistor density: 35%/year

■ Die size: 10-20%/year

Integration overall: 40-55%/year

DRAM capacity: 25-40%/year (slowing)

Flash capacity: 50-60%/year
15-20X cheaper/bit than DRAM

Magnetic disk technology: 40%/year

■ 15-25X cheaper/bit then Flash

300-500X cheaper/bit than DRAM

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

15

## **Bandwidth and Latency**

- Bandwidth or throughput
  - Total work done in a given time
  - 10,000-25,000X improvement for processors
  - 300-1200X improvement for memory and disks
- Latency or response time
  - Time between start and completion of an event
  - 30-80X improvement for processors
  - 6-8X improvement for memory and disks

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

16

rends in Technology



## **Transistors and Wires**

- Feature size
  - Minimum size of transistor or wire in x or y dimension
  - 10 microns in 1971 to .032 microns in 2011
  - Transistor performance scales linearly
    - Wire delay does not improve with feature size!
  - Integration density scales quadratically

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

#### Power and Energy

- Problem: Get power in, get power out
- Thermal Design Power (TDP)
  - Characterizes sustained power consumption
  - Used as target for power supply and cooling system
  - Lower than peak power, higher than average power consumption
- Clock rate can be reduced dynamically to limit power consumption
- Energy per task is often a better measurement

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

19

## **Dynamic Energy and Power**

- Dynamic energy
  - Transistor switch from 0 -> 1 or 1 -> 0
  - ½ x Capacitive load x Voltage<sup>2</sup>
- Dynamic power
  - ½ x Capacitive load x Voltage² x Frequency switched
- Reducing clock rate reduces power, not energy

M<

Copyright © 2012, Elsevier Inc. All rights reserved.





#### **Static Power**

- Static power consumption
  - Current<sub>static</sub> x Voltage
  - Scales with number of transistors
  - To reduce: power gating

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

23

## **Trends in Cost**

- Cost driven down by learning curve
  - Yield
- DRAM: price closely tracks cost
- Microprocessors: price depends on volume
  - 10% less for each doubling of volume

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

# **Integrated Circuit Cost**

#### Integrated circuit

 $Cost of integrated circuit = \frac{Cost of die + Cost of testing die + Cost of packaging and final test}{Final test yield}$ 

Cost of die = 
$$\frac{\text{Cost of wafer}}{\text{Dies per wafer} \times \text{Die yield}}$$

Dies per wafer = 
$$\frac{\pi \times (\text{Wafer diameter/2})^2}{\text{Die area}} - \frac{\pi \times \text{Wafer diameter}}{\sqrt{2 \times \text{Die area}}}$$

Bose-Einstein formula:

Die yield = Wafer yield  $\times 1/(1 + \text{Defects per unit area} \times \text{Die area})^N$ 

- Defects per unit area = 0.016-0.057 defects per square cm (2010)
- N = process-complexity factor = 11.5-15.5 (40 nm, 2010)

M<

Copyright © 2012, Elsevier Inc. All rights reserved.



#### **Dependability**

- Module reliability
  - Mean time to failure (MTTF)
  - Mean time to repair (MTTR)
  - Mean time between failures (MTBF) = MTTF + MTTR
  - Availability = MTTF / MTBF

IVI<

Copyright © 2012, Elsevier Inc. All rights reserved.

27

# **Measuring Performance**

- Typical performance metrics:
  - Response time
  - Throughput
- Speedup of X relative to Y
  - Execution time<sub>Y</sub> / Execution time<sub>X</sub>
- Execution time
  - Wall clock time: includes all system overheads
  - CPU time: only computation time
- Benchmarks
  - Kernels (e.g. matrix multiply)
  - Toy programs (e.g. sorting)
  - Synthetic benchmarks (e.g. Dhrystone)
  - Benchmark suites (e.g. SPEC06fp, TPC-C)

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

#### **Reporting Performance**

- Must be reproducible
- Complete description of the computer and compiler flags.
- Usually, compared to a standard machine execution time SPECRatioA = T<sub>ref</sub>/T<sub>A</sub>.
- Geometric mean

M<

Copyright © 2012, Elsevier Inc. All rights reserved.



# **Principles of Computer Design**

Tillables

■ The Processor Performance Equation

CPU time = CPU clock cycles for a program × Clock cycle time

$$CPU time = \frac{CPU \ clock \ cycles \ for \ a \ program}{Clock \ rate}$$

$$CPI = \frac{CPU \ clock \ cycles \ for \ a \ program}{Instruction \ count}$$

CPU time = Instruction count × Cycles per instruction × Clock cycle time

$$\frac{\underline{Instructions}}{\underline{Program}} \times \frac{\underline{Clock\ cycles}}{\underline{Instruction}} \times \frac{\underline{Seconds}}{\underline{Clock\ cycle}} = \frac{\underline{Seconds}}{\underline{Program}} = \underline{CPU\ time}$$

M<

Copyright © 2012, Elsevier Inc. All rights reserved.

31

# **Principles of Computer Design**

Principle

 Different instruction types having different CPIs

CPU clock cycles = 
$$\sum_{i=1}^{n} IC_i \times CPI_i$$

CPU time = 
$$\left(\sum_{i=1}^{n} IC_{i} \times CPI_{i}\right) \times Clock$$
 cycle time

M<

Copyright © 2012, Elsevier Inc. All rights reserved.



