Some notes on identifying exit and hypercall handlers in HyperV

Background

Not too long ago, a friend’s coworker wanted to know how to find the VM exit dispatching function in HyperV. I don’t have any interest in HyperV but decided to explore the question out of curiosity. It is not a difficult problem but it can be somewhat annoying if you have not done it before. Dmytro OleksiukGerhart X, and Saar Amar made references to it in [1, 3, 6]; however there is no explanation of how they arrived at it. This blog entry outlines my steps to identifying the code.

Different paths to the same destination

When my friend asked this question, I just referred him to Dmytro’s awesome project where he did a DMA attack and backdoored HyperV by patching its vmexit handler [6]. The source code for this specific part is in DmaBackdoorHv.c; the byte pattern he used to find the dispatcher is on lines 432-454. I will save you the time and give the pattern now: 4c 89 79 78. This will get you to a function which will eventually call into a massive function which will handle the vmexits. I am not sure how he arrived at this byte pattern but it works very well.

Gerhart’s blog outlines the steps he took to diff a HyperV patch [3]. While at it, he also shows the flow from the vmexit handler to the actual hypercall implementation. It is for an older version of HyperV, but the concepts still apply.

Saar’s excellent blog entry discusses the various HyperV components, their relationship with each other, and how one might start researching them [1]. The “MSRs” section briefly states that MSR accesses are handled inside the vmexit handler with a large switch/case (one per supported MSR). He even shows a partial HexRays decompilation of the MSR exit handler (MSR 0x40000200).

One of my approaches was to search for the MSR handler. I first searched for the immediate 0xc0000082 but there were a bunch of hits and the xrefs would just take you down a deep rabbithole. I needed to narrow it down to something more unique and decided to use 0x40000200 as mentioned in Saar’s blog. This resulted in 2 clean hits; if you xref on one of these and go up a two levels, you will get to the vmexit handler.

The previous method leverages some HyperV-specific knowledge, i.e., there exists a synthetic MSR that is unique to HyperV. If you did not know this, it might have taken a bit longer. An alternate approach I used was to leverage the Intel VTx semantics. We know that whenever a vmexit occurs, the CPU transitions into hypervisor mode and the exit handler is executed; in order for the hypervisor to know which handler to call, it must know the exit reason. There is only one way to read the exit reason: VMREAD. The Intel manual states that the exit reason is stored in the VMCS field encoding 0x4402 (see page B-7 of [4]). Hence, all we have to do is search for that immediate and we should have our answer.

Applying this method, we immediately find the main vmexit handler with the massive switch/case statement that Saar mentioned in his blog. Hypercalls are also triggered via vmexits so we can also identify the hypercall handlers from here. All we need to do is find the VMCALL exit reason (18); see page C-1 of [4] for other exit codes. Here is an example from HyperV:

<main vmexit handler>
...
mov     eax, 4402h
vmread  rcx, rax
...
    case 18:
...
        hypercallHandler(v4, v4, _R8, v38);
        goto LABEL_217;
...

<hypercallHandler>:
...
mov     rax, rbx
lea     rcx, off_FFFFF80000C00000  <-- base of hypercall table
and     eax, 3FFFh
lea     rax, [rax+rax*2]
lea     rdx, [rcx+rax*8]
mov     rax, [rsi+108h]
movzx   ecx, word ptr [rdx+14h]
inc     qword ptr [rax+rcx*8+1000h]
...

The hypercall table details and structures are described in [2, 3]. I noticed that the hypercall table is stored in the CONST segment on the hypervisor binary. It seems to be the only data there. This is convenient since you can now easily discover the hypercall table.

Addendum

Some people read an earlier draft and kindly shared their own insights and feedback. They are shared here for your benefit.

Saar noted that he identified and recovered the hypercall structures using a similar approach; in addition, he discovered all of the other vmexit handlers by cross-referencing the exit code from arch/x86/include/asm/vmx.h in [5]. This header is a great resource for those doing hypervisor development and reverse engineering. You can import most of this header into IDA and save yourself lots of typing. His Twitter feed has many interesting HyperV discoveries.

Satoshi Tanda suggested searching for VMRESUME as another way to find the dispatching code. The rationale is that the hypervisor must eventually return to the guest and VMRESUME does exactly that; the exit handling code is usually around this instruction. This is an excellent idea and it works in practice. In fact, Dmytro shared with me that he searched for the VMWRITE instruction that writes to the HOST_RIP (0x6c16) encoding in the VMCS.

Gerhart told me that he recovered some of the data structures using dynamic analysis. Basically what he did was injected code into the hypervisor and read the vmexit handlers. This was something I never considered.

It is interesting to see how various people approach the problem 🙂

Hypervisor training class

If you are interested in hypervisors and how they can be used for security applications, please consider our class on hypervisor development. For the first time ever, Satoshi Tanda and I will be offering a class on hypervisor development! We will explain all the concepts needed to write a hypervisor on Intel VT-x from scratch. We have been talking about it for a long time and are excited to finally offer it. You can read about it and register here.

References

[2] Amar, S. 2018. Twitter posting. https://twitter.com/AmarSaar/status/1024766795444453376

[3] Gerhart, X. 2017. Hyper-V debugging for beginners. Part 2, or half disclosure of MS13-092. http://hvinternals.blogspot.com/2017/10/hyper-v-debugging-for-beginners-part-2.html

[4] Intel 2017. Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 3: System Programming Guide (order number 325384-065US). Intel.

[5] Linux 2018. Linux kernel source (commit f99e3daf94ff35dd4a878d32ff66e1fd35223ad6). https://github.com/torvalds/linux/

[6] Oleksiuk, D. 2017. PCI Express DIY hacking toolkit for Xilinx SP605. https://github.com/Cr4sh/s6_pcie_microblaze