

## Exploiting the Microarchitecture: Transient Execution Attacks

Michael Schwarz (@misc0110)

April 11, 2019

Graz University of Technology

Who am I?



## Michael Schwarz

PhD candidate @ Graz University of Technology

**y** @misc0110

michael.schwarz@iaik.tugraz.at







COMPUTER CHIP FLAWS IMPACT BILLIONS OF DEVICES



**NEWS STREAM** 



• Bug-free software does not mean safe execution











- Bug-free software does not mean safe execution
- Information leaks due to underlying hardware
- Exploit leakage through side-effects



- Bug-free software does not mean safe execution
- Information leaks due to underlying hardware
- Exploit leakage through side-effects



Power consumption



Execution time



CPU caches





 Instruction Set Architecture (ISA) is an abstract model of a computer (x86, ARMv8, SPARC, ...)



- Instruction Set Architecture (ISA) is an abstract model of a computer (x86, ARMv8, SPARC, ...)
- Interface between hardware and software



- Instruction Set Architecture (ISA) is an abstract model of a computer (x86, ARMv8, SPARC, ...)
- Interface between hardware and software
- Microarchitecture is an ISA implementation



- Instruction Set Architecture (ISA) is an abstract model of a computer (x86, ARMv8, SPARC, ...)
- Interface between hardware and software
- Microarchitecture is an ISA implementation

















Caches and buffers



**Predictors** 







Caches and buffers



**Predictors** 



• Transparent for the programmer













- Transparent for the programmer
- ullet Timing optimizations o side-channel leakage

```
printf("%d", i);
printf("%d", i);
```





















## Caching speeds up Memory Accesses































char array [256 \* 4096]; // 256 pages of memory



```
char array[256 * 4096]; // 256 pages of memory
*(volatile char*) 0; // raise_exception();
array[84 * 4096] = 0;
```









• "Unreachable" code line was actually executed





- "Unreachable" code line was actually executed
- Exception was only thrown afterwards





- "Unreachable" code line was actually executed
- Exception was only thrown afterwards
- Out-of-order instructions leave microarchitectural traces





- "Unreachable" code line was actually executed
- Exception was only thrown afterwards
- Out-of-order instructions leave microarchitectural traces
- Give such instructions a name: transient instructions



• Add another layer of indirection to test

```
char array[256 * 4096]; // 256 pages of memory
```



• Add another layer of indirection to test

char array [256 \* 4096]; // 256 pages of memory



• Add another layer of indirection to test

char array [256 \* 4096]; // 256 pages of memory

Then check whether any part of array is cached





• Index of cache hit reveals data





- Index of cache hit reveals data
- Permission check is in some cases too late



• CPU uses data in out-of-order execution before permission check



- CPU uses data in out-of-order execution before permission check
- Meltdown can read any kernel address



- CPU uses data in out-of-order execution before permission check
- Meltdown can read any kernel address
- Physical memory is usually mapped in kernel



- CPU uses data in out-of-order execution before permission check
- Meltdown can read any kernel address
- Physical memory is usually mapped in kernel
- → Read arbitrary memory



Assumed Meltdown can one only read data from the L1



- Assumed Meltdown can one only read data from the L1
- Leakage from L3 or memory is possible, just slower



- Assumed Meltdown can one only read data from the L1
- Leakage from L3 or memory is possible, just slower
- Even leakage of UC (uncachable) memory regions...



- Assumed Meltdown can one only read data from the L1
- Leakage from L3 or memory is possible, just slower
- Even leakage of UC (uncachable) memory regions...
  - ...if other hyperthread (legally) accesses the data



- Assumed Meltdown can one only read data from the L1
- Leakage from L3 or memory is possible, just slower
- Even leakage of UC (uncachable) memory regions...
  - ...if other hyperthread (legally) accesses the data
  - ightarrow ...leaks from line fill buffer

Kernel addresses in user space are a problem

- Kernel addresses in user space are a problem
- Why don't we take the kernel addresses...





• ...and remove them if not needed?



- ...and remove them if not needed?
- User accessible check in hardware is not reliable



## Kernel View



## User View





• Linux: Kernel Page-table Isolation (KPTI)



• Linux: Kernel Page-table Isolation (KPTI)

• Apple: Released updates



• Linux: Kernel Page-table Isolation (KPTI)

• Apple: Released updates

• Windows: Kernel Virtual Address (KVA) Shadow



• Meltdown fully mitigated in software



- Meltdown fully mitigated in software
- Problem seemed to be solved



- Meltdown fully mitigated in software
- Problem seemed to be solved
- No attack surface left



- Meltdown fully mitigated in software
- Problem seemed to be solved
- No attack surface left
- That is what everyone thought





• Meltdown is a whole category of vulnerabilities



- Meltdown is a whole category of vulnerabilities
- Not only the user-accessible check



- Meltdown is a whole category of vulnerabilities
- Not only the user-accessible check
- Looking closer at the check...



• CPU uses virtual address spaces to isolate processes



- CPU uses virtual address spaces to isolate processes
- Physical memory is organized in page frames



- CPU uses virtual address spaces to isolate processes
- Physical memory is organized in page frames
- Virtual memory pages are mapped to page frames using page tables





User/Supervisor bit defines in which privilege level the page can be accessed





Present bit is the next obvious bit



ullet An even worse bug o Foreshadow-NG/L1TF



- ullet An even worse bug o Foreshadow-NG/L1TF
- Exploitable from VMs



- ullet An even worse bug o Foreshadow-NG/L1TF
- Exploitable from VMs
- Allows leaking data from the L1 cache



- ullet An even worse bug o Foreshadow-NG/L1TF
- Exploitable from VMs
- Allows leaking data from the L1 cache
- Same mechanism as Meltdown



- ullet An even worse bug o Foreshadow-NG/L1TF
- Exploitable from VMs
- Allows leaking data from the L1 cache
- Same mechanism as Meltdown
- Just a different bit in the PTE

| Page Table | 2 |
|------------|---|
|------------|---|

PTE 0

;

PTE #PTI

:

PTE 511

L1 Cache



L1 Cache





| Page T | a | b | le |
|--------|---|---|----|
|--------|---|---|----|

PTE 0

:

PTE #PTI

:

PTE 511

not present

L1 Cache







KAISER/KPTI/KVA does not help



- KAISER/KPTI/KVA does not help
- Only software workarounds



- KAISER/KPTI/KVA does not help
- Only software workarounds
  - $\rightarrow$  Flush L1 on VM entry



- KAISER/KPTI/KVA does not help
- Only software workarounds
  - $\rightarrow$  Flush L1 on VM entry
  - → Disable HyperThreading



- KAISER/KPTI/KVA does not help
- Only software workarounds
  - $\rightarrow$  Flush L1 on VM entry
  - → Disable HyperThreading
- Workarounds might not be complete

Pagefault











operation #n

time

operation #n

data

time











Transient cause?



Meltdown Tree









• Meltdown is not a fully solved issue



- Meltdown is not a fully solved issue
- The tree is extensible



- Meltdown is not a fully solved issue
- The tree is extensible
- More Meltdown-type issues to come



- Meltdown is not a fully solved issue
- The tree is extensible
- More Meltdown-type issues to come
- Silicon fixes might not be complete



• Meltdown not the only transient execution attacks



- Meltdown not the only transient execution attacks
- Spectre is a second class of transient execution attacks



- Meltdown not the only transient execution attacks
- Spectre is a second class of transient execution attacks
- Instead of faults, exploit control (or data) flow predictions



• CPU tries to predict the future (branch predictor), ...



- CPU tries to predict the future (branch predictor), ...
  - $\bullet\ \dots$  based on events learned in the past



- CPU tries to predict the future (branch predictor), ...
  - $\bullet\ \dots$  based on events learned in the past
- Speculative execution of instructions



- CPU tries to predict the future (branch predictor), ...
  - $\bullet\ \dots$  based on events learned in the past
- Speculative execution of instructions
- If the prediction was correct, ...



- CPU tries to predict the future (branch predictor), ...
  - ... based on events learned in the past
- Speculative execution of instructions
- If the prediction was correct, ...
  - ... very fast



- CPU tries to predict the future (branch predictor), ...
  - $\bullet\ \dots$  based on events learned in the past
- Speculative execution of instructions
- If the prediction was correct, ...
  - ... very fast
  - otherwise: Discard results





















































operation #n

time

operation #n

prediction

time













• Many predictors in modern CPUs



- Many predictors in modern CPUs
  - Branch taken/not taken (PHT)



- Many predictors in modern CPUs
  - Branch taken/not taken (PHT)
  - Call/Jump destination (BTB)



- Many predictors in modern CPUs
  - Branch taken/not taken (PHT)
  - Call/Jump destination (BTB)
  - Function return destination (RSB)



- Many predictors in modern CPUs
  - Branch taken/not taken (PHT)
  - Call/Jump destination (BTB)
  - Function return destination (RSB)
  - Load matches previous store (STL)



- Many predictors in modern CPUs
  - Branch taken/not taken (PHT)
  - Call/Jump destination (BTB)
  - Function return destination (RSB)
  - Load matches previous store (STL)
- Most are even shared among processes











## **Spectre Mistraining**



Transient cause?











• Spectre is not a bug



- Spectre is not a bug
- It is an useful optimization



- Spectre is not a bug
- It is an useful optimization
- $\rightarrow$  Cannot simply fix it (as with Meltdown)



- Spectre is not a bug
- It is an useful optimization
- → Cannot simply fix it (as with Meltdown)
- Workarounds for critical code parts

#### Spectre defenses in 3 categories:



C1 Mitigating or reducing the accuracy of covert channels



C2 Mitigating or aborting speculation



C3 Ensuring secret data cannot be reached



















• Many countermeasures only consider the cache to get data...



- Many countermeasures only consider the cache to get data...
- ...but there are other possibilities, e.g.,



- Many countermeasures only consider the cache to get data...
- ...but there are other possibilities, e.g.,
  - Port contention (SMoTherSpectre)



- Many countermeasures only consider the cache to get data...
- ...but there are other possibilities, e.g.,
  - Port contention (SMoTherSpectre)
  - AVX (NetSpectre)



- Many countermeasures only consider the cache to get data...
- ...but there are other possibilities, e.g.,
  - Port contention (SMoTherSpectre)
  - AVX (NetSpectre)
- Cache is just the easiest

Written by Michael Larabel in Linux Kernel on 24 November 2018 at 09:00 AM EST. 6 Comments



On Friday marked the release of the Linux 4.19.4 kernel as well as 4.14.83 and 4.9.139.

Greg Kroah-Hartman issued this latest round of stable point releases as basic maintenance updates. While these point releases don't tend to be too notable and generally go unmentioned on Phoronix, this round is worth pointing out since 4.19.4 and 4.14.83 are the releases that end up reverting the STIBP behavior that applied Single Thread Indirect Branch Predictors to all processes on

supported systems. That is what was introduced in Linux 4.20 and then back-ported to the 4.19/4.14 LTS branches, which in turn hurt the performance a lot. So for now the code is removed.

Written by Michael Larabel in Linux Kernel on 24 November 2018 at 09:00 AM EST. 6 Comments



On Friday marked the release of the Linux 4.19.4 kernel as well as 4.14.83 and 4.9.139.

Greg Kroah-Hartman issued this latest round of stable point releases as basic maintenance updates. While these point releases don't tend to be too notable and generally go unmentioned on Phoronix, this round is worth pointing out since 4.19.4 and 4.14.83 are the releases that end up reverting the

STIBP behavior that applied Single Thread Indirect Branch Predictors to all processes on supported systems. That is what was introduced in Linux 4.20 and then back-ported to the 4.19/4.14 LTS branches, which in turn hurt the performance a lot. So for now the code is removed.

Written by Michael Larabel in Linux Kernel on 24 November 2018 at 09:00 AM EST. 6 Comments



On Friday marked the release of the Linux 4.19.4 kernel as well as 4.14.83 and 4.9.139.

Greg Kroah-Hartman issued this latest round of stable point releases as basic maintenance updates. While these point releases don't tend to be too notable and generally go unmentioned on Phoronix, this round is worth pointing out since 4.19.4 and 4.14.83 are the releases that end up reverting the

STIBP behavior that applied Single Thread Indirect Branch Predictors to all processes on supported systems. That is what was introduced in Linux 4.20 and then back-ported to the 4.19/4.14 LTS branches, which in turn hurt the performance a lot. So for now the code is removed.

Written by Michael Larabel in Linux Kernel on 24 November 2018 at 09:00 AM EST. 6 Comments



On Friday marked the release of the Linux 4.19.4 kernel as well as 4.14.83 and 4.9.139.

Greg Kroah-Hartman issued this latest round of stable point releases as basic maintenance updates. While these point releases don't tend to be too notable and generally go unmentioned on Phoronix, this round is worth pointing out since 4.19.4 and 4.14.83 are the releases that end up reverting the

STIBP behavior that applied Single Thread Indirect Branch Predictors to all processes on supported systems. That is what was introduced in Linux 4.20 and then back-ported to the 4.19/4.14 LTS branches, which in turn hurt the performance a lot. So for now the code is removed.

Retpoline (compiler extension)



#### Retpoline (compiler extension)



 $\rightarrow$  Always predict to enter an endless loop

#### Retpoline (compiler extension)

```
{...}
```

- → Always predict to enter an endless loop
- What if someone decides to fix the wrong prediction?



• Current mitigations are either incomplete or cost performance



- Current mitigations are either incomplete or cost performance
- → More research required



- Current mitigations are either incomplete or cost performance
- → More research required
  - Both on attacks and defenses



- Current mitigations are either incomplete or cost performance
- → More research required
  - Both on attacks and defenses
- ightarrow Efficient defenses only possible when attacks are known





• Transient Execution Attacks are...



- Transient Execution Attacks are...
  - ...a novel class of attacks



- Transient Execution Attacks are...
  - ...a novel class of attacks
  - ...extremely powerful



- Transient Execution Attacks are...
  - ...a novel class of attacks
  - ...extremely powerful
  - ...only at the beginning



- Transient Execution Attacks are...
  - ...a novel class of attacks
  - ...extremely powerful
  - ...only at the beginning
- ullet Many optimizations introduce side channels o now exploitable



inglipeom



# Exploiting the Microarchitecture: Transient Execution Attacks

Michael Schwarz (@misc0110)

April 11, 2019

Graz University of Technology