Chrome PDFium 整数截断漏洞分析

作者：刘科

0x01. 漏洞简介

chromium:697847 是 PDFium 里面由于 整数截断 引起的一个堆溢出漏洞（将 unsigned long 赋值给uint32），简单记录一下。

漏洞原理：

PDFium 使用 zlib 的 inflate 接口解压数据；
在 zlib 中，解压后的数据的大小使用 unsigned long类型的变量 total_out来存储；
PDFium 使用 uint32 类型的变量来接收 total_out 的值；
在 64 位环境中，当解压后的数据大小超过 4GB 时（即超过uint32的范围）会产生截断；
后续 PDFium 使用截断后的值分配堆块并拷贝解压后的数据，导致了堆溢出；0x02. 漏洞分析

2.1 崩溃信息

在 64 位 Ubuntu 上开启 AddressSanitizer 编译 PDFium，使用编译出来的 pdfium_test测试原贴提供的 PoC 文件，可以看到如下崩溃信息（已简化）：

==43290==ERROR: AddressSanitizer: heap-buffer-overflow on address 
    0x60200000ecd1 at pc 0x0000004a5dad bp 0x7ffdcfa78c10 sp 0x7ffdcfa783c0
WRITE of size 8349028 at 0x60200000ecd1 thread T0
    #0 0x4a5dac in __asan_memcpy 
    #1 0x8e5d80 in (anonymous namespace)::FlateUncompress() 
        pdfium/core/fxcodec/codec/fx_codec_flate.cpp:603:9
    #2 0x8e5d80 in CCodec_FlateModule::FlateOrLZWDecode() 
    #3 0x6f3710 in FPDFAPI_FlateOrLZWDecode() 
    #4 0x6f49a3 in PDF_DataDecode()
    #5 0x6db1b5 in CPDF_StreamAcc::LoadAllData()
    #6 0x7a7e58 in CPDF_ContentParser::Start()
    #7 0x628de1 in CPDF_Page::StartParse()
    #8 0x628de1 in CPDF_Page::ParseContent()
    #9 0x50b0ab in FPDF_LoadPage()
    #10 0x4f8173 in GetPageForIndex()
    #11 0x4f8aea in RenderPage()
    #12 0x4fc753 in RenderPdf()
    #13 ...

0x60200000ecd1 is located 0 bytes to the right of 
    1-byte region [0x60200000ecd0,0x60200000ecd1)
allocated by thread T0 here:
    #0 0x4bc230 in calloc()
    #1 0x8e5c58 in FX_AllocOrDie()
    #2 0x8e5c58 in (anonymous namespace)::FlateUncompress()
        pdfium/core/fxcodec/codec/fx_codec_flate.cpp:595
    #3 0x8e5c58 in CCodec_FlateModule::FlateOrLZWDecode()
    #4 0x6f3710 in FPDFAPI_FlateOrLZWDecode()
    #5 0x6f49a3 in PDF_DataDecode()
    #6 0x6db1b5 in CPDF_StreamAcc::LoadAllData()

可以看出这里出发了堆溢出行为，目标堆块的大小只有 1 字节（[0x60200000ecd0,0x60200000ecd1)），而程序尝试通过 memcpy往堆块上写入大量数据。总结如下：

memcpy调用位于 FlateUncompress 函数（core/fxcodec/codec/fx_codec_flate.cpp的第 603 行）；
堆块的分配操作同样位于 FlateUncompress函数（fx_codec_flate.cpp 的第 595 行）；2.2 POC 分析

原贴提供的 PoC 文件十分简单：4 号 obj 包含 0x3FB2B2 字节数据，/Filter的值为 /FlateDecode，即数据使用了 zlib/deflate 算法进行压缩，需要使用 zlib/inflate算法进行解压缩。

2.3 zlib 分析

先来看一下 zlib 在解压数据时需要用到的关键结构z_stream（注意这里 total_out 的类型为unsigned long）：

typedef struct z_stream_s {
    z_const Bytef *next_in; /* 存储待解压数据的位置 */
    uInt     avail_in;      /* 还有多少字节的数据需要解压 */
    uLong    total_in;      /* 已经处理的数据大小（原始压缩数据） */

    Bytef    *next_out;     /* 存储已解压数据的位置 */
    uInt     avail_out;     /* 还可以存储多少字节的已解压数据 */
    uLong    total_out;     /* 已经处理的数据大小（已解压数据） */

    z_const char *msg;
    struct internal_state FAR *state;

    alloc_func zalloc;
    free_func  zfree; 
    voidpf     opaque;

    int     data_type;
    uLong   adler;
    uLong   reserved;
} z_stream;

在调用 zlib 的inflate 解压数据时，z_stream 的成员都会进行相应的更新，这里着重观察 total_out 成员。

int ZEXPORT inflate(z_streamp strm, int flush)
{
    // ......
    in -= strm->avail_in;       // 处理了多少压缩数据
    out -= strm->avail_out;     // 数据解压缩后的大小
    strm->total_in += in;       // 记录
    strm->total_out += out;     // 记录
    // ......
}

2.4 FlateUncompress 分析

为了提高代码的可读性，这里删除了 FlateUncompress中无关紧要的一些代码。

下面的代码展示了数据的解压过程，可以看出数据是分块进行解压的，且解压的结果存储在 result_tmp_bufs 中。

 // 初始化
  void* context = FPDFAPI_FlateInit(my_alloc_func, my_free_func);

  // 设置输入数据 (待解压数据)
  FPDFAPI_FlateInput(context, src_buf, src_size);

  // 对输入数据分块进行解压
  std::vector<uint8_t*> result_tmp_bufs;
  uint8_t* cur_buf = guess_buf.release();
  while (1) {
    // 调用 inflate 解压一块数据
    int32_t ret = FPDFAPI_FlateOutput(context, cur_buf, buf_size);
    // cur_buf 的剩余存储空间
    int32_t avail_buf_size = FPDFAPI_FlateGetAvailOut(context);

    // 解压出错
    if (ret != Z_OK) {
      last_buf_size = buf_size - avail_buf_size;
      result_tmp_bufs.push_back(cur_buf);
      break;
    }

    // cur_buf 还有剩余空间, 说明解压完毕
    if (avail_buf_size != 0) {
      last_buf_size = buf_size - avail_buf_size;
      result_tmp_bufs.push_back(cur_buf);
      break;
    }

    // 存储当前解压结果, 并为下一次解压做准备
    result_tmp_bufs.push_back(cur_buf);
    cur_buf = FX_Alloc(uint8_t, buf_size + 1);
    cur_buf[buf_size] = '\0';
  }

因为数据是分块进行解压的，所以解压完之后需要进行拼接操作。然而 FPDFAPI_FlateGetTotalOut返回类型为int，且dest_size 的类型为uint32，所以会发生截断，后面 FX_Alloc(uint8_t, dest_size)分配的堆块也无法存储全部解压数据。

  // 解压后总的数据大小
  dest_size = FPDFAPI_FlateGetTotalOut(context);

  if (result_tmp_bufs.size() == 1) {
    // 仅有一块数据
    dest_buf = result_tmp_bufs[0];
  } else {
    // 存在多块数据
    // 根据 dest_size 分块堆块
    uint8_t* result_buf = FX_Alloc(uint8_t, dest_size);
    uint32_t result_pos = 0;
    // 拷贝数据
    for (size_t i = 0; i < result_tmp_bufs.size(); i++) {
      uint8_t* tmp_buf = result_tmp_bufs[i];
      uint32_t tmp_buf_size = buf_size;
      if (i == result_tmp_bufs.size() - 1) {
        tmp_buf_size = last_buf_size;
      }
      // Crash
      FXSYS_memcpy(result_buf + result_pos, tmp_buf, tmp_buf_size);
      result_pos += tmp_buf_size;
      FX_Free(result_tmp_bufs[i]);
    }
    dest_buf = result_buf;
  }

2.5 gdb 调试

对FlateUncompress下断点，可以看到待解压的数据以及大小（前面提到过大小为 0x3FB2B2）：

(gdb) b (anonymous namespace)::FlateUncompress
(gdb) r crash.pdf

Breakpoint 1, (anonymous namespace)::FlateUncompress (
    src_buf=0xa672b0 "x\234\354\301\201",   // 压缩数据
    src_size=4174514,                       // 数据大小
    orig_size=0, 
    dest_buf=@0x7fffffffd350: 0x0, 
    dest_size=@0x7fffffffd2f0: 4294967295, 
    offset=@0x7fffffffd184: 0)
    at ../../core/fxcodec/codec/fx_codec_flate.cpp:508

(gdb) p /x src_size
$1 = 0x3fb2b2

(gdb) x /40xb src_buf
0xa672b0:    0x78    0x9c    0xec    0xc1    0x81    0x00    0x00    0x00
0xa672b8:    0x00    0x80    0x20    0xd6    0xfd    0x25    0x16    0xa9
0xa672c0:    0x0a    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0xa672c8:    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0xa672d0:    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00

在调用FPDFAPI_FlateGetTotalOut 函数所在的行（第 590 行）下断点，可以看到 total_out 的值为 0x100000000，当赋值给 uint32 时会截断为 0。

(gdb) b 590
Breakpoint 2 at 0x5de5a1: 
    file core/fxcodec/codec/fx_codec_flate.cpp, line 590.

(gdb) c
Continuing.
Breakpoint 2, (anonymous namespace)::FlateUncompress ...

(gdb) p /x context
$2 = 0xe625f0

(gdb) x /20xw content
No symbol "content" in current context.
(gdb) x /20xw 0xe625f0
0xe625f0:    0x00e62562    0x00000000    0x00000000    0x00000000
0xe62600:    0x003fb2b2    0x00000000    0xf68d5d48    0x00007ffe
0xe62610:    0x0048f82c    0x00000000    [0x00000000    0x00000001]    // total_out
0xe62620:    0x00000000    0x00000000    0x00e62670    0x00000000
0xe62630:    0x005dcca0    0x00000000    0x005dccca    0x00000000

(gdb) s
FPDFAPI_FlateGetTotalOut (context=0xe625f0) at 
    core/fxcodec/codec/fx_codec_flate.cpp:29
29      return ((z_stream*)context)->total_out;

(gdb) p /x ((z_stream*)context)->total_out
$3 = 0x100000000

2.6 补丁分析

f6d0146 对相关的代码进行了 Patch 以防止出现 Heap Overflow ，但是新的代码仍然无法处理 4GB 以上的数据，如下所示：

@@ -594,14 +598,17 @@
     } else {
       uint8_t* result_buf = FX_Alloc(uint8_t, dest_size);
       uint32_t result_pos = 0;
+      uint32_t remaining = dest_size;
       for (size_t i = 0; i < result_tmp_bufs.size(); i++) {
         uint8_t* tmp_buf = result_tmp_bufs[i];
         uint32_t tmp_buf_size = buf_size;
         if (i == result_tmp_bufs.size() - 1) {
           tmp_buf_size = last_buf_size;
         }
-        FXSYS_memcpy(result_buf + result_pos, tmp_buf, tmp_buf_size);
-        result_pos += tmp_buf_size;
+        uint32_t cp_size = std::min(tmp_buf_size, remaining);
+        FXSYS_memcpy(result_buf + result_pos, tmp_buf, cp_size);
+        result_pos += cp_size;
+        remaining -= cp_size;
         FX_Free(result_tmp_bufs[i]);
       }

后续相关的 commit 对这一段代码进行了清理，如 7b8e8c 和 1e8c39 。

文章来源于腾讯云开发者社区，点击查看原文