

### NOREBA: A Compiler-Informed Non-speculative Out-of-Order Commit Processor

Ali Hajiabadi<sup>+</sup>, Andreas Diavastos<sup>+</sup>, Trevor E. Carlson<sup>+</sup>

National University of Singapore
Universitat Politècnica de Catalunya





ASPLOS '21

## NOREBA: Goal

- Current processors hold on to resources longer than necessary
- NOREBA implements an intelligent resource management technique based on true branch dependencies
- $\rightarrow$ Performance Improvement
- $\rightarrow$ Low Power and Area Overheads





## General-Purpose Out-of-Order Processors

- End of Moore's law <u>requires</u> efficient computing
- However, general-purpose CPUs still have a significant impact on the overall performance; <u>Hard-to-parallelize work is left for the CPU<sup>1</sup></u>
- Re-thinking the traditional design to unlock efficiency:
  - <u>Co-design the different layers of the system</u>













































































#### **In-Order Commit is conservative**

What if Inst9, Inst10, Inst11, and Inst12 are independent from Branch?





#### **In-Order Commit is conservative**





#### **In-Order Commit is conservative**





#### **In-Order Commit is conservative**





#### **In-Order Commit is conservative**











































What if **Inst9**, **Inst10**, **Inst11**, and **Inst12** are independent from **Branch**?







What if **Inst9**, **Inst10**, **Inst11**, and **Inst12** are independent from **Branch**?





#### **In-Order Commit is conservative**

What if **Inst9**, **Inst10**, **Inst11**, and **Inst12** are independent from **Branch**?





#### **In-Order Commit is conservative**

What if Inst9, Inst10, Inst11, and Inst12 are independent from Branch?





## NOREBA: HW/SW Co-operative OoO-Commit

#### **Questions**

How to detect branch dependecies *non–speculatively*? How to implement OoO–commit *efficiently*? How to handle *exceptions* and context switches?





## NOREBA: Static Compiler Analysis





## NOREBA: Microarchitecture



Tracking true branch dependencies informed by the compiler for each instruction A lightweight implementation for OoO-commit that provides more opportunities to release resources



## NOREBA: Challenge in Exception Handling

• Need to save and restore the state of the OoO-committed instructions



instructions for communicating with the OS



## Evaluation: Setup

- Simulation: gem5
- Compiler: LLVM-10
- Benchmarks:
  - SPEC CPU2006: C/C++ programs, running single 1B instruction representatives (using SimPoint)
  - *MiBench*: entire program runs

| L1d/i size                     | 32 <i>KB,</i> 4 clk   |
|--------------------------------|-----------------------|
| L2 size                        | 256 <i>KB,</i> 12 clk |
| L3 size                        | 1 <i>MB,</i> 36 clk   |
| Fetch/dispatch/commit<br>width | 4/4/4                 |
| Branch Predictor               | TAGE-SC-L-8KB         |
| Prefetcher                     | DCPT                  |
| Selective ROB                  |                       |
| ROB'                           | 224 entries           |
| BR-CQ                          | 2 $	imes$ 8 entries   |
| PR-CQ                          | 8 entries             |
| BIT/CQT size                   | 8                     |
| CIT size                       | 128                   |
| Baseline ROB                   | 224 entries           |
| IQ/LQ/SQ/RF                    | 68/72/56/168          |



## **Evaluation:** Performance



**Reaches ~95% of a fully branch speculative OoO-commit implementation** 



## **Evaluation:** Critical Branches



More dependent instructions and less critical → Fewer opportunities

~ 1.1X improvement for bzip2



## Evaluation: Size of Resources



We are close to aggressive and branch speculative OoO-commit (~95% of SpeculativeBR OoO-C)

#### Higher performance for bigger cores with more resources <u>NOREBA continues to scale</u>



## **Evaluation:** NOREBA and Prefetchers



Additive effect of combining NOREBA and prefetchers

→ Higher Performance using both

Prefetching allows continuing execution, but NOREBA allows continuing committing instructions



## Evaluation: Power and Area Overhead



4% power overhead, 8% area overhead

Low overhead for the extra performance (~22% on average, up to 230%)



## NOREBA: Overview of the Design

Implementation Able to handle *exceptions* and context switches





## Conclusion

- Efficient interaction between different layers of the system <u>unlocks</u> efficiency and performance for general-purpose processors
- NOREBA provides a HW/SW co-design solution that enables OoOcommit and better resource management
  - 22% performance improvement over the baseline and achieving 95% of the aggressive branch speculative OoO-commit implementation
  - Low power and area overheads (~4% power, and ~8% area overhead)



# Thanks for your attention

NOREBA: A Compiler-Informed Non-speculative Out-of-Order Commit Processor Ali Hajiabadi<sup>+</sup>, Andreas Diavastos<sup>+</sup>, Trevor E. Carlson<sup>+</sup>

National University of Singapore
Universitat Politècnica de Catalunya





ASPLOS '21