

# Side-Channel Lab I

**Michael Schwarz** 

Security Week Graz 2019

Michael Schwarz — Security Week Graz 2019

1













www.tugraz.at





 everyday hardware: servers, workstations, laptops, smartphones...

www.tugraz.at





 everyday hardware: servers, workstations, laptops, smartphones...

• remote side-channel attacks

• safe software infrastructure  $\rightarrow$  no bugs, e.g., Heartbleed

- **safe software** infrastructure  $\rightarrow$  no bugs, e.g., Heartbleed
- does not mean safe execution

Side channels

- safe software infrastructure  $\rightarrow$  no bugs, e.g., Heartbleed
- does not mean safe execution
- information leaks because of the hardware it runs on

- safe software infrastructure  $\rightarrow$  no bugs, e.g., Heartbleed
- does not mean safe execution
- information leaks because of the hardware it runs on
- no "bug" in the sense of a mistake  $\rightarrow$  lots of performance optimizations

- safe software infrastructure  $\rightarrow$  no bugs, e.g., Heartbleed
- does not mean safe execution
- information leaks because of the hardware it runs on
- no "bug" in the sense of a mistake  $\rightarrow$  lots of performance optimizations
- $\rightarrow\,$  crypto and sensitive info., e.g., keystrokes and mouse movements





#### Why targeting the cache?

• shared across cores

#### Why targeting the cache?

- shared across cores
- fast

#### Why targeting the cache?

- shared across cores
- fast
- $\rightarrow$  fast cross-core attacks!



• caches improve performance

- caches improve performance
- SRAM is expensive  $\rightarrow$  small caches

- caches improve performance
- SRAM is expensive  $\rightarrow$  small caches
- different timings for memory accesses

- caches improve performance
- SRAM is expensive  $\rightarrow$  small caches
- different timings for memory accesses
  - data is **cached**  $\rightarrow$  cache hit  $\rightarrow$  **fast**

- caches improve performance
- SRAM is expensive  $\rightarrow$  small caches
- different timings for memory accesses
  - data is **cached**  $\rightarrow$  cache hit  $\rightarrow$  **fast**
  - data is **not cached**  $\rightarrow$  cache miss  $\rightarrow$  **slow**







www.tugraz.at



• L1 and L2 are private

Michael Schwarz — Security Week Graz 2019

www.tugraz.at



- L1 and L2 are private
- last-level cache:

Michael Schwarz — Security Week Graz 2019



- L1 and L2 are private
- last-level cache:
  - divided in slices



- L1 and L2 are private
- last-level cache:
  - divided in slices
  - shared across cores



- L1 and L2 are private
- last-level cache:
  - divided in slices
  - shared across cores
  - inclusive

















www.tugraz.at

## **Inclusive property**



- inclusive LLC: superset of L1 and L2
- data evicted from the LLC is also evicted from L1 and L2

www.tugraz.at

## **Inclusive property**



- inclusive LLC: superset of L1 and L2
- data evicted from the LLC is also evicted from L1 and L2
- a core can evict lines in the private L1 of another core



On current Intel CPUs:

• Registers: 0-1 cycle



On current Intel CPUs:

- Registers: 0-1 cycle
- L1 cache: 4 cycles

www.tugraz.at

On current Intel CPUs:

- Registers: 0-1 cycle
- L1 cache: 4 cycles
- L2 cache: 12 cycles

www.tugraz.at

On current Intel CPUs:

- Registers: 0-1 cycle
- L1 cache: 4 cycles
- L2 cache: 12 cycles
- L3 cache: 26-31 cycles

www.tugraz.at

On current Intel CPUs:

- Registers: 0-1 cycle
- L1 cache: 4 cycles
- L2 cache: 12 cycles
- L3 cache: 26-31 cycles
- DRAM memory: >120 cycles

How every timing attack works:

• learn timing of different corner cases

How every timing attack works:

- learn timing of different corner cases
- later, we recognize these corner cases by timing only

## 1. build two cases: cache hits and cache misses

- 1. build two cases: cache hits and cache misses
- 2. time each case many times (get rid of noise)

- 1. build two cases: cache hits and cache misses
- 2. time each case many times (get rid of noise)
- 3. we have a histogram!

- 1. build two cases: cache hits and cache misses
- 2. time each case many times (get rid of noise)
- 3. we have a histogram!
- 4. find a threshold to distinguish the two cases

1. measure time

- 1. measure time
- 2. access variable (always cache **hit**)

- 1. measure time
- 2. access variable (always cache hit)
- 3. measure time

- 1. measure time
- 2. access variable (always cache hit)
- 3. measure time
- 4. update histogram with delta



1. measure time

- 1. measure time
- 2. access variable (always cache miss)

- 1. measure time
- 2. access variable (always cache miss)
- 3. measure time

- 1. measure time
- 2. access variable (always cache miss)
- 3. measure time
- 4. update histogram with delta

- 1. measure time
- 2. access variable (always cache miss)
- 3. measure time
- 4. update histogram with delta
- 5. flush variable (clflush instruction)

# Time to code

# Accurate timings

- very short timings
- rdtsc instruction: cycle-accurate timestamps

## Accurate timings

- very short timings
- rdtsc instruction: cycle-accurate timestamps

[...] rdtsc function() rdtsc [...]

- do you measure what you *think* you measure?
- **out-of-order** execution → what is really executed

## Accurate timings

- do you measure what you *think* you measure?
- **out-of-order** execution  $\rightarrow$  what is really executed

| rdtsc      | rdtsc      | rdtsc      |
|------------|------------|------------|
| function() | []         | rdtsc      |
| []         | rdtsc      | function() |
| rdtsc      | function() | []         |



• use pseudo-serializing instruction rdtscp (recent CPUs)

- use pseudo-serializing instruction rdtscp (recent CPUs)
- and/or use serializing instructions like cpuid

- use pseudo-serializing instruction rdtscp (recent CPUs)
- and/or use serializing instructions like cpuid
- and/or use fences like mfence

- use pseudo-serializing instruction rdtscp (recent CPUs)
- and/or use serializing instructions like cpuid
- and/or use fences like mfence

Intel, How to Benchmark Code Execution Times on Intel IA-32 and IA-64 Instruction Set Architectures White Paper, December 2010.



Michael Schwarz — Security Week Graz 2019

www.tugraz.at





Michael Schwarz — Security Week Graz 2019



• as high as possible

- as high as possible
- most cache hits are below

- as high as possible
- most cache hits are below
- no cache miss below



• Hit  $\rightarrow$  Data is fetched from buffers, L1, L2, or L3

- Hit  $\rightarrow$  Data is fetched from buffers, L1, L2, or L3
- Miss  $\rightarrow$  Data is fetched from DRAM



Michael Schwarz — Security Week Graz 2019

www.tugraz.at



• cache attacks  $\rightarrow$  exploit timing differences of memory accesses

- cache attacks  $\rightarrow$  exploit timing differences of memory accesses
- attacker monitors which lines are accessed, not the content

- cache attacks  $\rightarrow$  exploit timing differences of memory accesses
- attacker monitors which lines are accessed, not the content
- covert channel: two processes communicating with each other

- cache attacks  $\rightarrow$  exploit timing differences of memory accesses
- attacker monitors which lines are accessed, not the content
- covert channel: two processes communicating with each other
  - not allowed to do so, e.g., across VMs

- cache attacks  $\rightarrow$  exploit timing differences of memory accesses
- attacker monitors which lines are accessed, not the content
- covert channel: two processes communicating with each other
   not allowed to do so, e.g., across VMs
- side-channel attack: one malicious process spies on benign processes

- cache attacks  $\rightarrow$  exploit timing differences of memory accesses
- attacker monitors which lines are accessed, not the content
- covert channel: two processes communicating with each other
   not allowed to do so, e.g., across VMs
- side-channel attack: one malicious process spies on benign processes
  - e.g., steals crypto keys, spies on keystrokes

















www.tugraz.at

# Signatures (RSA)

# $M = C^d \mod n$

Michael Schwarz — Security Week Graz 2019

23







Michael Schwarz — Security Week Graz 2019

23

























# Time to code

• locate **key-dependent** memory accesses

- locate **key-dependent** memory accesses
- How to locate key-dependent memory accesses?



• It's complicated:

- It's complicated:
  - Large binaries and libraries (third-party code)

- It's complicated:
  - Large binaries and libraries (third-party code)
  - Many libraries (gedit: 60MB)

- It's complicated:
  - Large binaries and libraries (third-party code)
  - Many libraries (gedit: 60MB)
  - Closed-source / unknown binaries

- It's complicated:
  - Large binaries and libraries (third-party code)
  - Many libraries (gedit: 60MB)
  - Closed-source / unknown binaries
  - Self-compiled binaries

- It's complicated:
  - Large binaries and libraries (third-party code)
  - Many libraries (gedit: 60MB)
  - Closed-source / unknown binaries
  - Self-compiled binaries
- Difficult to find all exploitable addresses

• Preprocessing step to find exploitable addresses automatically

**Exploitation Phase** 

- Preprocessing step to find exploitable addresses automatically
  - w.r.t. "events" (keystrokes, encryptions, ...)

**Exploitation** Phase

- Preprocessing step to find exploitable addresses automatically
  - w.r.t. "events" (keystrokes, encryptions, ...)
  - called "Cache Template"

**Exploitation** Phase

- Preprocessing step to find exploitable addresses automatically
  - w.r.t. "events" (keystrokes, encryptions, ...)
  - called "Cache Template"

**Exploitation Phase** 

• Monitor exploitable addresses

#### Attacker address space





#### Victim address space



#### Cache is empty



Attacker triggers an event

# **Profiling Phase**



Attacker checks one address for cache hits ("Reload")

# **Profiling Phase**



### Update number of cache hits per event

# **Profiling Phase**



### Attacker flushes shared memory



## Repeat for higher accuracy



### Continue with next address



### Continue with next address

Michael Schwarz — Security Week Graz 2019

```
www.tugraz.at 📕
```

```
$> ps -A | grep gedit
$> cat /proc/<pid>/maps
00400000-00489000 r-xp 00000000 fd:01 396356
/usr/bin/gedit
7f5a96991000-7f5a96a51000 r-xp 00000000 fd:01 399365
/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30
...
```

memory range, access rights, offset, -, -, file name

```
$> cd practicals/02_cache_template_attacks/
$> make
$> # start the targeted program (e.g., gedit)
$> sleep 2; ./profiling /usr/lib/x86_64-linux-gnu/
libgdk-3.so.0.2200.30
```

... and hold down a key in the target program

www.tugraz.at

```
$> cd practicals/02_cache_template_attacks/
$> make
$> # start the targeted program (e.g., gedit)
$> sleep 2; ./profiling /usr/lib/x86_64-linux-gnu/
libgdk-3.so.0.2200.30
```

... and hold down a key in the target program save addresses with peaks!

www.tugraz.at



| <pre>\$&gt; # ./spy <file> <offset></offset></file></pre>                 |        |
|---------------------------------------------------------------------------|--------|
| <pre>\$&gt; ./spy /usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30 3</pre> | 336896 |
| Monitoring offset 336896                                                  |        |
| Hit #O                                                                    |        |
| Hit #1                                                                    |        |
| Hit #2                                                                    |        |
|                                                                           |        |
|                                                                           |        |

# Time to code

| 2                                                      | Terminal                                         | - • >                                          | Open 🗸 | + | Untitled     | Document 1     | Save        | = | - + | ×  |
|--------------------------------------------------------|--------------------------------------------------|------------------------------------------------|--------|---|--------------|----------------|-------------|---|-----|----|
| File Edit View Search Terminal Help                    |                                                  |                                                |        |   |              |                |             |   |     |    |
| % sleep 2; ./spy 300 7f05<br>8050<br>∎                 | 5140a4000-7f051417b000<br>/usr/lib/x86_64-linux- | r-xp 0x20000 08:02 26<br>gnu/gedit/libgedit.so | 1      |   |              |                |             |   |     |    |
| [nrefetch]                                             |                                                  | <dir> 14 03 2017 21.44.96</dir>                |        |   |              |                |             |   |     |    |
| -                                                      |                                                  |                                                |        |   |              |                |             |   |     |    |
| File Edit View Search Terminal Help<br>shark% ./spy [] |                                                  |                                                |        |   |              |                |             |   |     |    |
| (/nome/daniei/ja:                                      |                                                  |                                                |        |   | Plain Text 👻 | Tab Width: 2 👻 | Ln 1, Col 1 |   | 11  | NS |

### Cache Template Attack Demo

# **Profiling Phase: 1 Event, 1 Address**

ADDRESS 00202020



# **Profiling Phase: 1 Event, 1 Address**





## Example: Cache Hit Ratio for (0x7c800, n): 200 / 200

# **Profiling Phase: All Events, 1 Address**



# **Profiling Phase: All Events, 1 Address**



Example: Cache Hit Ratio for (0x7c800, u): 13 / 200

# **Profiling Phase: All Events, 1 Address**



Distinguish n from other keys by monitoring 0x7c800

# Profiling Phase: All Events, All Addresses



Michael Schwarz — Security Week Graz 2019

D. Gruss, R. Spreitzer, and S. Mangard. Cache Template Attacks: Automating Attacks on Inclusive Last-Level Caches. In: USENIX Security Symposium. 2015.



# Side-Channel Lab II

**Michael Schwarz** 

Security Week Graz 2019

Michael Schwarz — Security Week Graz 2019

1



• Two programs would like to communicate

• Two programs would like to communicate but are not allowed to do so

- Two programs would like to communicate but are not allowed to do so
  - either because there is no communication channel...

- Two programs would like to communicate but are not allowed to do so
  - either because there is no communication channel...
  - ...or the channels are monitored and programs are stopped on communication attempts

- Two programs would like to communicate but are not allowed to do so
  - either because there is no communication channel...
  - ...or the channels are monitored and programs are stopped on communication attempts
- Use side channels and stay stealthy

### www.tugraz.at

# **Covert channel**



Michael Schwarz — Security Week Graz 2019

### www.tugraz.at

# **Covert channel**





| method           | raw capacity | err. rate | true capacity | env.   |
|------------------|--------------|-----------|---------------|--------|
| F+F [Gru+16]     | 3968Kbps     | 0.840%    | 3690Kbps      | native |
| $F{+}R$ [Gru+16] | 2384Kbps     | 0.005%    | 2382Kbps      | native |
| E+R [Lip+16]     | 1141Kbps     | 1.100%    | 1041Kbps      | native |
| P+P [Mau+17]     | 601Kbps      | 0.000%    | 601Kbps       | native |
| P+P [Liu+15]     | 600Kbps      | 1.000%    | 552Kbps       | virt   |
| P+P [Mau+17]     | 362Kbps      | 0.000%    | 362Kbps       | native |

















# Sending Data (easy but inefficient)



# Sending Data (easy but inefficient)



# Sending Data (easy but inefficient)



# Time to code

# **Operating Systems 101**





• Kernel is isolated from user space

# **Memory Isolation**





- Kernel is isolated from user space
- This isolation is a combination of hardware and software

# Memory Isolation





- Kernel is isolated from user space
- This isolation is a combination of hardware and software
- User applications cannot access anything from the kernel





• CPU support virtual address spaces to isolate processes



- CPU support virtual address spaces to isolate processes
- Physical memory is organized in page frames



- CPU support virtual address spaces to isolate processes
- Physical memory is organized in page frames
- Virtual memory pages are mapped to page frames using page tables

### Address Translation on x86-64



### Address Translation on x86-64





• User/Supervisor bit defines in which privilege level the page can be accessed

## **Direct-physical map**



• Kernel is typically mapped into every address space

## **Direct-physical map**



- Kernel is typically mapped into every address space
- Entire physical memory is mapped in the kernel





#### www.tugraz.at

### Loading an address

































• Instruction Set Architecture (ISA) is an abstract model of a computer (x86, ARMv8, SPARC, ...)



- Instruction Set Architecture (ISA) is an abstract model of a computer (x86, ARMv8, SPARC, ...)
- Serves as the interface between hardware and software



- Instruction Set Architecture (ISA) is an abstract model of a computer (x86, ARMv8, SPARC, ...)
- Serves as the interface between hardware and software
- Microarchitecture is an actual implementation of the ISA



- Instruction Set Architecture (ISA) is an abstract model of a computer (x86, ARMv8, SPARC, ...)
- Serves as the interface between hardware and software
- Microarchitecture is an actual implementation of the ISA





| IF | ID | ΕX | MEM | WB  |     |     |     |    |
|----|----|----|-----|-----|-----|-----|-----|----|
|    | IF | ID | ΕX  | MEM | WB  |     |     |    |
|    |    | IF | ID  | EX  | МЕМ | WB  |     |    |
|    |    |    | IF  | ID  | EX  | MEM | WB  |    |
|    |    |    |     | IF  | ID  | ΕX  | MEM | WB |

- Instructions are...
  - fetched (IF) from the L1 Instruction Cache



| IF | ID | ΕX | MEM | WB  |     |     |     |    |
|----|----|----|-----|-----|-----|-----|-----|----|
|    | IF | ID | ΕX  | MEM | WB  |     |     |    |
|    |    | IF | ID  | EX  | MEM | WB  |     |    |
|    |    |    | IF  | ID  | ΕX  | МЕМ | WB  |    |
|    |    |    |     | IF  | ID  | ΕX  | MEM | WB |

- Instructions are...
  - fetched (IF) from the L1 Instruction Cache
  - decoded (ID)



| IF | ID | ΕX | MEM | WB  |     |     |     |    |
|----|----|----|-----|-----|-----|-----|-----|----|
|    | IF | ID | ΕX  | MEM | WB  |     |     |    |
|    |    | IF | ID  | EX  | MEM | WB  |     |    |
|    |    |    | IF  | ID  | ΕX  | МЕМ | WB  |    |
|    |    |    |     | IF  | ID  | ΕX  | MEM | WB |

- Instructions are...
  - fetched (IF) from the L1 Instruction Cache
  - decoded (ID)
  - executed (EX) by execution units



| IF | ID | ΕX | MEM | WB  |     |     |     |    |
|----|----|----|-----|-----|-----|-----|-----|----|
|    | IF | ID | ΕX  | MEM | WB  |     |     |    |
|    |    | IF | ID  | EX  | МЕМ | WB  |     |    |
|    |    |    | IF  | ID  | EX  | МЕМ | WB  |    |
|    |    |    |     | IF  | ID  | ΕX  | MEM | WB |

- Instructions are...
  - fetched (IF) from the L1 Instruction Cache
  - decoded (ID)
  - executed (EX) by execution units
- Memory access is performed (MEM)



| IF | ID | ΕX | MEM | WB  |     |     |     |    |
|----|----|----|-----|-----|-----|-----|-----|----|
|    | IF | ID | ΕX  | MEM | WB  |     |     |    |
|    |    | IF | ID  | ΕX  | мем | WB  |     |    |
|    |    |    | IF  | ID  | EX  | MEM | WB  |    |
|    |    |    |     | IF  | ID  | ΕX  | MEM | WB |

- Instructions are...
  - fetched (IF) from the L1 Instruction Cache
  - decoded (ID)
  - executed (EX) by execution units
- Memory access is performed (MEM)
- Architectural register file is updated (WB)

0000

• Instructions are executed in-order

0000

- Instructions are executed in-order
- Pipeline stalls when stages are not ready

0000

- Instructions are executed in-order
- Pipeline stalls when stages are not ready
- If data is not cached, we need to wait

#### **Out-of-order Execution**







Instructions are

• fetched and decoded in the front-end



Instructions are

- fetched and decoded in the front-end
- dispatched to the backend



Instructions are

- fetched and decoded in the front-end
- dispatched to the backend
- processed by individual execution units



Instructions

• are executed out-of-order



Instructions

- are executed out-of-order
- wait until their dependencies are ready



Instructions

- are executed out-of-order
- wait until their dependencies are ready
  - Later instructions might execute prior earlier instructions



Instructions

- are executed out-of-order
- wait until their dependencies are ready
  - Later instructions might execute prior earlier instructions
- retire in-order



Instructions

- are executed out-of-order
- wait until their dependencies are ready
  - Later instructions might execute prior earlier instructions
- retire in-order
  - State becomes architecturally visible

#### www.tugraz.at 📕

#### **Out-of-Order Execution**



Instructions

- are executed out-of-order
- wait until their dependencies are ready
  - Later instructions might execute prior earlier instructions
- retire in-order
  - State becomes architecturally visible
- Exceptions are checked during retirement

#### www.tugraz.at 🗖

#### **Out-of-Order Execution**



Instructions

- are executed out-of-order
- wait until their dependencies are ready
  - Later instructions might execute prior earlier instructions
- retire in-order
  - State becomes architecturally visible
- Exceptions are checked during retirement
  - Flush pipeline and recover state

# The state does not become architecturally visible but ...

# The state does not become architecturally visible

but . . .







Michael Schwarz — Security Week Graz 2019





```
• New code
```

```
char data = 'S'; // a "secret" value
// ...
*(volatile char*) 0;
array[data * 4096] = 0;
```





```
• New code
```

```
char data = 'S'; // a "secret" value
// ...
*(volatile char*) 0;
array[data * 4096] = 0;
```

• Luckily we know how to catch a segfault





• New code

```
char data = 'S'; // a "secret" value
// ...
*(volatile char*) 0;
array[data * 4096] = 0;
```

- Luckily we know how to catch a segfault
- Then check whether any part of array is cached

#### Checking the array







## Time to code



• Add another layer of indirection to test



• Add another layer of indirection to test

#### Which address?



• Check /proc/kallsyms



#### Which address?



• Check /proc/kallsyms



sudo cat /proc/kallsyms | grep banner

• or check /proc/pid/pagemap and print address

```
printf("target: %p\n",
    libsc_get_physical_address(ctx, vaddr));
```

#### Which address?



• Check /proc/kallsyms



sudo cat /proc/kallsyms | grep banner

• or check /proc/pid/pagemap and print address

• or start at a random address and iterate

## Time to code





















































operation #n



operation 
$$\#n$$

prediction























• Many predictors in modern CPUs





- Many predictors in modern CPUs
  - Branch taken/not taken (PHT)





- Many predictors in modern CPUs
  - Branch taken/not taken (PHT)
  - Call/Jump destination (BTB)





- Many predictors in modern CPUs
  - Branch taken/not taken (PHT)
  - Call/Jump destination (BTB)
  - Function return destination (RSB)



- Many predictors in modern CPUs
  - Branch taken/not taken (PHT)
  - Call/Jump destination (BTB)
  - Function return destination (RSB)
  - Load matches previous store (STL)



- Many predictors in modern CPUs
  - Branch taken/not taken (PHT)
  - Call/Jump destination (BTB)
  - Function return destination (RSB)
  - Load matches previous store (STL)
- Most are even shared among processes



www.tugraz.at







Shared Branch Prediction State

Michael Schwarz — Security Week Graz 2019

25





Shared Branch Prediction State





## Time to code



# Side-Channel Lab II

**Michael Schwarz** 

Security Week Graz 2019

Michael Schwarz — Security Week Graz 2019

26

D. Gruss, C. Maurice, K. Wagner, and S. Mangard. Flush+Flush: A Fast and Stealthy Cache Attack. In: DIMVA. 2016.

- M. Lipp, D. Gruss, R. Spreitzer, C. Maurice, and S. Mangard. ARMageddon: Cache Attacks on Mobile Devices. In: USENIX Security Symposium. 2016.
- F. Liu, Y. Yarom, Q. Ge, G. Heiser, and R. B. Lee. Last-Level Cache Side-Channel Attacks are Practical. In: S&P. 2015.

C. Maurice, M. Weber, M. Schwarz, L. Giner, D. Gruss,C. Alberto Boano, S. Mangard, and K. Römer. Hello from the Other Side:SSH over Robust Cache Covert Channels in the Cloud. In: NDSS. 2017.