Page MenuHomePhabricator

DRAMA: Exploiting DRAM Addressing for Cross-CPU Attacks
Open, NormalPublic

Description

Paper and Code:

https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/pessl

Test PoC: https://github.com/IAIK/drama

Summary:

This work builds on Rowhammer. An attacker running an unprivileged process in a VM is able to log keystroke events for the entire system.

"In this attack, the spy and the victim can run on separate CPUs and do not share memory, i.e. , no access to shared libraries and no page deduplication between VMs. "

Mitigation:

stress-m2 in parallel (i.e., the attacker’s core is under stress) made any measurements impossible. While no false positive detections occurred, only 9 events were correctly detected. Thus, our attack is susceptible to noise especially if the attacker only gets a fraction of CPU time on its core.

or

NUMA with non-interleaved memory combined with CPU pinning also described as valid mitigation. Problem is multi NUMA environments exist for server systems only for the most part. Two protection domains not enough for VM based OSs.

The memory stress solution is too expensive for battery and of questionable effectiveness.

Solution must be on host out of reach of malicious code in vm.


Conversation with Daniel Gruss (researcher):

https://www.whonix.org/pipermail/whonix-devel/2016-August/000707.html
https://www.whonix.org/pipermail/whonix-devel/2016-August/000709.html
https://www.whonix.org/pipermail/whonix-devel/2016-August/000710.html
https://www.whonix.org/pipermail/whonix-devel/2016-August/000711.html
https://www.whonix.org/pipermail/whonix-devel/2016-August/000712.html
https://www.whonix.org/pipermail/whonix-devel/2016-August/000717.html
https://www.whonix.org/pipermail/whonix-devel/2016-August/000722.html

Details

Impact
Normal

Event Timeline

HulaHoop created this task.Aug 18 2016, 2:33 PM
HulaHoop renamed this task from sponsors Platinum Sponsor help promote USENIX Security '16 button Get more Help Promote graphics! connect with usenix Twitter Facebook LinkedIn Google+ YouTube twitter Tweets by USENIXSecurity usenix conference policies Event Code of Conduct Conference Network Policy Statement on Environmental Responsibility Policy DRAMA: Exploiting DRAM Addressing for Cross-CPU Attacks to DRAMA: Exploiting DRAM Addressing for Cross-CPU Attacks .Aug 18 2016, 2:34 PM

@ethanwhite

Though this ticket is unrelated to your research it would be great if you can give some insight on the battery life/system strain cost of simulating it.

The proposed solution:

stress-m2 in parallel on the host.

Patrick updated the task description. (Show Details)Aug 19 2016, 8:50 PM

I didn't notice a very important point about this class of attacks and have been mistakenly conflating this definition with side-channel attacks which are more relevant (and deadly) to the Whonix threat model. In summary covert channels require colluding malicious code on both sides of a barrier while the latter [2] doesn't.

We should separate between local covert channel attacks and network based ones. The network based ones are very dangerous because the artificial signals created on the machine leak in the network traffic which is immediately observable and collected by a network GPA.

We need to decide how relevant local covert channels are for Whonix. In our threat model we define a host or GW compromise as fatal so this becomes irrelevant. Under what scenarios does this threat become plausible? The only example in mind is an infected anonymous VM receiving private information that can deanonymize a user from another instance of a snooping process running in a clearnet VM. (includes JS code). Is this something we should defend against?


[1] From a paper cited in the DRAMA paper:

http://www.cs.wm.edu/~hnw/paper/HyperWhisper.pdf

Whispers in the Hyper-space: High-speed Covert Channel Attacks in the Cloud

6.1.1 Attack Scenario
Covert channel attacks are distinct from a seemingly similar threat, side channel attacks [22, 24]. Side channels extrapolate information by observing an unknowing sender, while covert channels transfer data between two collaborating parities. As a result, a successful covert
channel attack requires an “insider” to function as a datasource. However, this additional requirement does not significantly reduce the usefulness of covert channels in data theft attacks. Data theft attacks are normally launched in two steps, infiltration and exfiltration. In the infiltration step, attackers leverage multiple attack vectors, such as buffer overflow [4], VM image pollution [2, 26], and various social engineering techniques [15, 27], to place “insiders” in the victim and gain partial control over it. And then, in the exfiltration step, the “insiders” try to traffic sensitive in formation from the victim back to the attackers. Because the “insiders” usually would only have very limited control of the victim, their behaviors are subjected to strict security surveillance, e.g., firewall, network intrusion detection, traffic logging, etc. Therefore, covert channels become ideal choices for secret data transmissions under such circumstances.

[2] (Cryptographers answers to side-channels are to pay attention to how crypto lib timing info, use crypto hardware acceleration and also CPU pinning.

Re-read the DRAM paper and there are multiple attacks outlined - 1 covert, 1 side-channel. The keystroke sniffing is a side channel so the ticket still valid either way.

HulaHoop added a comment.EditedAug 23 2016, 7:29 PM

Not enough to pin vcpus. Must restrict host from accessing the pCPU assigned to a VM. [1]

It should be allowed to access the GW cpu too since its part of the TCB.


[1] http://redhatstackblog.redhat.com/2015/05/05/cpu-pinning-and-numa-topology-awareness-in-openstack-compute/

"At this point if we created a guest we would already see some changes in the XML, pinning the guest vCPU(s) to the cores listed in vcpu_pin_set:

<vcpu placement='static' cpuset='2-3,6-7'>1</vcpu>

Now that we have set up the guest virtual machine instances so that they will only be allowed to run on cores 2, 3, 6, and, 7 we must also set up the host processes so that they will not run on these cores – restricting themselves instead to cores 0, 1, 4, and 5. To do this we must set the isolcpus kernel argument – adding this requires editing the system’s boot configuration.

On the Red Hat Enterprise Linux 7 systems used in this example this is done using grubby to edit the configuration:

grubby --update-kernel=ALL --args="isolcpus=2,3,6,7" "

https://serverfault.com/a/343771 - useful KVM information

HulaHoop added a comment.EditedAug 24 2016, 2:14 PM

The choices for systems with a single NUMA node are:

Detection:https://dreamsofastone.blogspot.co.at/2015/11/detecting-stealth-mode-cache-attacks.html

Using extremely resource demanding stress daemon with unreliable results.

Advise the user to not perform keyboard tasks of different security levels concurrently. At least pause suspicious VMs.

Even if a user opts for a more expensive machine with 2 NUMA nodes that is still a big restriction on multi-VM systems like Qubes which make use of many protection domains.


We may have to deal with it using stress because the side-channel can sniff key-stroke timings.

The network based ones are very dangerous because the artificial signals created on the machine leak in the network traffic which is immediately observable and collected by a network GPA.

We need to decide how relevant local covert channels are for Whonix.

I think this is a place to return to the weakest point principle. There are many things an adversary can do if they've achieved local (non-root, but not browser-sandboxed,) code execution; for example, they could sit in a loop uploading screenshots to a server, or helpfully rsync all the user's files to a machine they control.

I'd suggest that covert channels that require local code execution (as before, not root, but not browser-sandboxed) are, at least for the moment, definitely not the weakest point; as outlined above, the adversary could partake in many, much more dangerous, actions if given local code execution.

With that said, I do think this is something we want to address; the side channel in particular could be devastating.

Though this ticket is unrelated to your research it would be great if you can give some insight on the battery life/system strain cost of simulating it.
stress -m 2 in parallel on the host

It's almost precisely as bad as stress --cpu 8. It divides my battery life in three.

We may have to deal with it using stress because the side-channel can sniff key-stroke timings.

Isn't this a cache-related attack? If so, in theory, couldn't we just have the hypervisor flush the cache every time it context switches a logical CPU?

I'd suggest that covert channels that require local code execution (as before, not root, but not browser-sandboxed) are, at least for the moment, definitely not the weakest point; as outlined above, the adversary could partake in many, much more dangerous, actions if given local code execution.

Makes sense. Thats why exploitation prevention (grsec) is a very important part of defense besides containment after the fact.

With that said, I do think this is something we want to address; the side channel in particular could be devastating.

Yes

It divides my battery life in three.

Thanks for looking at it. Its impractical then.

Isn't this a cache-related attack? If so, in theory, couldn't we just have the hypervisor flush the cache every time it context switches a logical CPU?

AFAIK (please feel free to correct me) The only thing these attacks depend on is sharing the same DRAM module row buffer instead of a CPU cache.

The malicious code abuses cache flushing or eviction to access DRAM as much as possible. Reverses physical addressing (automated within seconds) because this info is not readily available in a VM. Causes row conflicts with victim processes allowing it to measure and spy on row access times.

https://www.usenix.org/sites/default/files/conference/protected-files/security16_slides_pessl.pdf


I am currently contacting the researchers. There is likely some solution that isn't as resource intensive. It combines restricting cache flushing (clflush), blocking timing sources (removing guest timers) and making sure no is hyperthreading/multithreading for guests so they can't use it as an alternative for timers.

HulaHoop updated the task description. (Show Details)Aug 29 2016, 2:11 PM
HulaHoop updated the task description. (Show Details)Aug 29 2016, 11:39 PM

Done. Applied all measures in last comment:

https://github.com/Whonix/whonix-libvirt/commit/df7d27d77da6f4570de3a7173f624fa0794663ee
https://github.com/Whonix/whonix-libvirt/commit/731b891f4b1486c77990e65e85cf24a3b57dc4b3
https://github.com/Whonix/whonix-libvirt/commit/773a83d14bd6927fcd03344ecba35985766545a6

Stripped all external timers avilable for the guest. Only the coarse and inaccurate refined-jiffies inside the guest kernel is available:

https://github.com/torvalds/linux/blob/master/kernel/time/jiffies.c

Coarse timers are useless for attackers. They need at least subnanosecond accuracy.

As per section 3.3 of https://www.usenix.org/system/files/conference/usenixsecurity16/sec16_paper_lipp.pdf

The only thing remaining is to make sure no true multithreading is available in the guest.

QEMU executes code in single threaded environment:

https://archive.is/zgPF3

QEMU actually uses a hybrid architecture that combines event-driven programming with threads. It makes sense to do this because an event loop cannot take advantage of multiple cores since it only has a single thread of execution. In addition, sometimes it is simpler to write a dedicated thread to offload one specific task rather than integrate it into an event-driven architecture. Nevertheless, the core of QEMU is event-driven and most code executes in that environment.

AFAICT Likely to stay that way long-term. Stick with single vcpu environments:

http://www.linux-kvm.org/images/1/17/Kvm-forum-2013-Effective-multithreading-in-QEMU.pdf

TL;DR

Multithreading is not a problem in KVM.