On Decompilation (Of SPU Binaries)

First one needs to understand, that an ideal decompiler, that will recover the original source code, is impossible to create. The process of compilation is removing a lot of high level information, such as used data types or loop types. Then optimization could re-arrange code/data flow, eliminate common subexpressions, perform function inlining, etc. Therefore any output of our decompiler will most likely not match the original source code, but if we do it right, we can get a (very) close approximation.

So what is it, that I’m doing at the moment?

Currently the decompiler will need to be fed with an ELF binary, that has its section headers present and from them it will extract information about executable and data regions. For every executable region it will then extract subroutines and split every of them into basic blocks. A basic block is a piece of code, that only contains simple operations and ends with a branch. This basics blocks might get split again, if a branch to any of its operation occurs from inside the subroutine. This way we will collect a list of all the basic blocks, can link them according to their exit branches and end up with the control flow graphs of all the subroutines. For identification of high level statements like ifs and loops (see [REC] for loop detection) we compute DFS and dominator trees and for later use the frontiers of each block (see [DOM] for used algorithms). Next we perform register liveness analysis to identify subroutine arguments and return values (see [LIV] for used algorithm). Now we can translate all the assembler instructions into simplified operations, e.g. translate “ori rt, ra, 0” into “rt <- ra” and identify local stack variables in that process too, following IBM’s SPU ABI specifications (see [ABI]).

More additional steps are then required to produce high level code, as you can see in the following list (entries marked with (DONE) are finished and working already):

  • Extract text/data regions (DONE)
  • Extract subroutines (DONE)
  • Extract basic blocks (DONE)
  • Create control flow graphs (DONE)
  • Compute (reverse) DFS/dominator/frontier information (DONE)
  • Extract high level statements using (reverse) DFS/dominator trees (DONE)
  • Local and global liveness analysis to find subroutine arguments (DONE)
  • Extract and simplify operations from every basic block (DONE)
  • Transform operations into SSA form using frontier information (PARTIAL)
  • Perform constant/expression propagation and dead code elimination (PARTIAL)
  • Transform operations from SSA form
  • Build expression tree from optimized operations
  • Output high level (pseudo-)code from restored expressions

I’m currently looking into methods that don’t need SSA form to perform propagation, but if they don’t work (too well) I’ll have to stick with SSA. Here is the output of a little argument test program and here is the original source (R ~ register, S ~ stackvar, I ~ immediate).

After SPU decompilation is working, I plan to refactor the source code to make it more generic, so it’s easier to port it to other CPU architectures (e.g. PPC).

References:

Advertisements

spud control flow graph

Today i did some more work on my spu decompiler and now it correctly builds control flow graphs from spu binaries. It’s a very simple control flow graph by now, but the current implementation is enough to make the execution path of the binary easier to understand already. I also implemented graphviz output to visualize the generated control flow graph. The generated image along with the disassembly of the binary already makes reverse engineering a lot easier.

Take this nonsense sourcecode for example:

void __attribute__((__noinline__)) g(char *dst, char *src, int cnt)
{
    while(cnt > 0)
    {
        *dst = *src;
        dst++;
        src++;
        cnt--;
    }
}

void _start()
{
    char dst[11], src[11] = "abcd1dcba2";
    g(dst, src, 11);

    int i, j = 0;
    for(i = 0; i < 10; i++)
        j += i * i;
    j++;

    asm("stop");
}

If you then compile this code without optimizations and let objdump generate the disassembly you will get something like this for the g function:

000:    24 fe c0 81     stqd    $1,-80($1)
004:    1c ec 00 81     ai    $1,$1,-80
008:    34 01 00 82     lqd    $2,64($1)    # 40
00c:    3e c0 00 86     cwd    $6,0($1)
010:    b0 40 81 86     shufb    $2,$3,$2,$6
014:    24 01 00 82     stqd    $2,64($1)    # 40
018:    34 00 c0 82     lqd    $2,48($1)    # 30
01c:    3e c0 00 83     cwd    $3,0($1)
020:    b0 40 82 03     shufb    $2,$4,$2,$3
024:    24 00 c0 82     stqd    $2,48($1)    # 30
028:    34 00 80 82     lqd    $2,32($1)    # 20
02c:    3e c0 00 83     cwd    $3,0($1)
030:    b0 40 82 83     shufb    $2,$5,$2,$3
034:    24 00 80 82     stqd    $2,32($1)    # 20
038:    32 00 0e 80     br    0xac    # ac
03c:    34 00 c0 82     lqd    $2,48($1)    # 30
040:    1c 03 41 03     ai    $3,$2,13
044:    34 00 01 02     lqd    $2,0($2)
048:    3b 80 c1 02     rotqby    $2,$2,$3
04c:    04 00 01 04     ori    $4,$2,0
050:    34 01 00 82     lqd    $2,64($1)    # 40
054:    34 00 01 03     lqd    $3,0($2)
058:    3e 80 01 05     cbd    $5,0($2)
05c:    b0 60 c2 05     shufb    $3,$4,$3,$5
060:    24 00 01 03     stqd    $3,0($2)
064:    34 01 00 82     lqd    $2,64($1)    # 40
068:    1c 00 41 03     ai    $3,$2,1
06c:    34 01 00 82     lqd    $2,64($1)    # 40
070:    3e c0 00 84     cwd    $4,0($1)
074:    b0 40 81 84     shufb    $2,$3,$2,$4
078:    24 01 00 82     stqd    $2,64($1)    # 40
07c:    34 00 c0 82     lqd    $2,48($1)    # 30
080:    1c 00 41 03     ai    $3,$2,1
084:    34 00 c0 82     lqd    $2,48($1)    # 30
088:    3e c0 00 84     cwd    $4,0($1)
08c:    b0 40 81 84     shufb    $2,$3,$2,$4
090:    24 00 c0 82     stqd    $2,48($1)    # 30
094:    34 00 80 82     lqd    $2,32($1)    # 20
098:    1c ff c1 03     ai    $3,$2,-1
09c:    34 00 80 82     lqd    $2,32($1)    # 20
0a0:    3e c0 00 84     cwd    $4,0($1)
0a4:    b0 40 81 84     shufb    $2,$3,$2,$4
0a8:    24 00 80 82     stqd    $2,32($1)    # 20
0ac:    34 00 80 82     lqd    $2,32($1)    # 20
0b0:    4c 00 01 02     cgti    $2,$2,0
0b4:    21 7f f1 02     brnz    $2,0x3c    # 3c
0b8:    1c 14 00 81     ai    $1,$1,80    # 50
0bc:    35 00 00 00     bi    $0

And this is the disassembly of the _start function:

0c0:    24 00 40 80     stqd    $0,16($1)
0c4:    24 fe 80 81     stqd    $1,-96($1)
0c8:    1c e8 00 81     ai    $1,$1,-96
0cc:    33 80 20 83     lqr    $3,0x1d0    # 1d0
0d0:    34 00 80 82     lqd    $2,32($1)    # 20
0d4:    3e e0 00 84     cdd    $4,0($1)
0d8:    b0 40 81 84     shufb    $2,$3,$2,$4
0dc:    24 00 80 82     stqd    $2,32($1)    # 20
0e0:    40 b0 99 03     il    $3,24882    # 6132
0e4:    34 00 80 82     lqd    $2,32($1)    # 20
0e8:    3e a2 00 84     chd    $4,8($1)
0ec:    b0 40 81 84     shufb    $2,$3,$2,$4
0f0:    24 00 80 82     stqd    $2,32($1)    # 20
0f4:    40 80 00 03     il    $3,0
0f8:    34 00 80 82     lqd    $2,32($1)    # 20
0fc:    3e 82 80 84     cbd    $4,10($1)
100:    b0 40 81 84     shufb    $2,$3,$2,$4
104:    24 00 80 82     stqd    $2,32($1)    # 20
108:    1c 0c 00 83     ai    $3,$1,48    # 30
10c:    1c 08 00 82     ai    $2,$1,32    # 20
110:    04 00 01 04     ori    $4,$2,0
114:    40 80 05 85     il    $5,11
118:    33 7f dd 00     brsl    $0,0x0
11c:    40 80 00 03     il    $3,0
120:    34 01 00 82     lqd    $2,64($1)    # 40
124:    3e c0 00 84     cwd    $4,0($1)
128:    b0 40 81 84     shufb    $2,$3,$2,$4
12c:    24 01 00 82     stqd    $2,64($1)    # 40
130:    40 80 00 03     il    $3,0
134:    34 01 40 82     lqd    $2,80($1)    # 50
138:    3e c0 00 84     cwd    $4,0($1)
13c:    b0 40 81 84     shufb    $2,$3,$2,$4
140:    24 01 40 82     stqd    $2,80($1)    # 50
144:    32 00 0a 80     br    0x198    # 198
148:    34 01 40 82     lqd    $2,80($1)    # 50
14c:    04 00 01 03     ori    $3,$2,0
150:    34 01 40 82     lqd    $2,80($1)    # 50
154:    78 a0 81 85     mpyh    $5,$3,$2
158:    78 a0 c1 04     mpyh    $4,$2,$3
15c:    79 80 81 82     mpyu    $2,$3,$2
160:    18 01 02 83     a    $3,$5,$4
164:    18 00 81 82     a    $2,$3,$2
168:    34 01 00 83     lqd    $3,64($1)    # 40
16c:    18 00 81 83     a    $3,$3,$2
170:    34 01 00 82     lqd    $2,64($1)    # 40
174:    3e c0 00 84     cwd    $4,0($1)
178:    b0 40 81 84     shufb    $2,$3,$2,$4
17c:    24 01 00 82     stqd    $2,64($1)    # 40
180:    34 01 40 82     lqd    $2,80($1)    # 50
184:    1c 00 41 03     ai    $3,$2,1
188:    34 01 40 82     lqd    $2,80($1)    # 50
18c:    3e c0 00 84     cwd    $4,0($1)
190:    b0 40 81 84     shufb    $2,$3,$2,$4
194:    24 01 40 82     stqd    $2,80($1)    # 50
198:    34 01 40 82     lqd    $2,80($1)    # 50
19c:    4c 02 41 02     cgti    $2,$2,9
1a0:    20 7f f5 02     brz    $2,0x148    # 148
1a4:    34 01 00 82     lqd    $2,64($1)    # 40
1a8:    1c 00 41 03     ai    $3,$2,1
1ac:    34 01 00 82     lqd    $2,64($1)    # 40
1b0:    3e c0 00 84     cwd    $4,0($1)
1b4:    b0 40 81 84     shufb    $2,$3,$2,$4
1b8:    24 01 00 82     stqd    $2,64($1)    # 40
1bc:    00 00 00 00     stop
1c0:    1c 18 00 81     ai    $1,$1,96    # 60
1c4:    34 00 40 80     lqd    $0,16($1)
1c8:    35 00 00 00     bi    $0
1cc:    00 20 00 00     lnop

You can see that the functions end with a bi $0 instruction, which means branch indirect to the address stored in the link register ($0). If a function contains calls to other functions the compiler will have to save the link register somewhere. This is done by storing the link register on the stack. If you look at the disassembly of the _start function, there is a stqd $0, 16($1). This is the instruction that does the saving ($1 is the stack pointer). If you look at the end of the function _start there is a lqd $0, 16($1), this instruction restores the link register from the stack. But if you have a look at the g function there is no saving of the link register, as this function does not call any other functions.

And this is the control flow graph generated from the compiled assembly (image generated with graphviz):

control flow graph

The left graph is the g function and the right one the _start function, it is now easy to identify the conditional jumps and even the loops. The edges have different meanings, NEXT means a simple translation to the next block, BRANCH is an unconditional branch and BRANCH_COND is a conditional one. For example block 000ac and block 0003c form a loop, because there exists an edge with the type NEXT from 0003c to 000ac and a conditional branch from 000ac to 0003c and 000b8. Note that any conditional branch produces two edges, one edge to the block if the branch is taken and one to the other if it is not taken.

So the next big step will be to automatically generate the big structures like if and loop from the control flow graph. But more of this later.