spud control flow graph

Today i did some more work on my spu decompiler and now it correctly builds control flow graphs from spu binaries. It’s a very simple control flow graph by now, but the current implementation is enough to make the execution path of the binary easier to understand already. I also implemented graphviz output to visualize the generated control flow graph. The generated image along with the disassembly of the binary already makes reverse engineering a lot easier.

Take this nonsense sourcecode for example:

void __attribute__((__noinline__)) g(char *dst, char *src, int cnt)
{
    while(cnt > 0)
    {
        *dst = *src;
        dst++;
        src++;
        cnt--;
    }
}

void _start()
{
    char dst[11], src[11] = "abcd1dcba2";
    g(dst, src, 11);

    int i, j = 0;
    for(i = 0; i < 10; i++)
        j += i * i;
    j++;

    asm("stop");
}

If you then compile this code without optimizations and let objdump generate the disassembly you will get something like this for the g function:

000:    24 fe c0 81     stqd    $1,-80($1)
004:    1c ec 00 81     ai    $1,$1,-80
008:    34 01 00 82     lqd    $2,64($1)    # 40
00c:    3e c0 00 86     cwd    $6,0($1)
010:    b0 40 81 86     shufb    $2,$3,$2,$6
014:    24 01 00 82     stqd    $2,64($1)    # 40
018:    34 00 c0 82     lqd    $2,48($1)    # 30
01c:    3e c0 00 83     cwd    $3,0($1)
020:    b0 40 82 03     shufb    $2,$4,$2,$3
024:    24 00 c0 82     stqd    $2,48($1)    # 30
028:    34 00 80 82     lqd    $2,32($1)    # 20
02c:    3e c0 00 83     cwd    $3,0($1)
030:    b0 40 82 83     shufb    $2,$5,$2,$3
034:    24 00 80 82     stqd    $2,32($1)    # 20
038:    32 00 0e 80     br    0xac    # ac
03c:    34 00 c0 82     lqd    $2,48($1)    # 30
040:    1c 03 41 03     ai    $3,$2,13
044:    34 00 01 02     lqd    $2,0($2)
048:    3b 80 c1 02     rotqby    $2,$2,$3
04c:    04 00 01 04     ori    $4,$2,0
050:    34 01 00 82     lqd    $2,64($1)    # 40
054:    34 00 01 03     lqd    $3,0($2)
058:    3e 80 01 05     cbd    $5,0($2)
05c:    b0 60 c2 05     shufb    $3,$4,$3,$5
060:    24 00 01 03     stqd    $3,0($2)
064:    34 01 00 82     lqd    $2,64($1)    # 40
068:    1c 00 41 03     ai    $3,$2,1
06c:    34 01 00 82     lqd    $2,64($1)    # 40
070:    3e c0 00 84     cwd    $4,0($1)
074:    b0 40 81 84     shufb    $2,$3,$2,$4
078:    24 01 00 82     stqd    $2,64($1)    # 40
07c:    34 00 c0 82     lqd    $2,48($1)    # 30
080:    1c 00 41 03     ai    $3,$2,1
084:    34 00 c0 82     lqd    $2,48($1)    # 30
088:    3e c0 00 84     cwd    $4,0($1)
08c:    b0 40 81 84     shufb    $2,$3,$2,$4
090:    24 00 c0 82     stqd    $2,48($1)    # 30
094:    34 00 80 82     lqd    $2,32($1)    # 20
098:    1c ff c1 03     ai    $3,$2,-1
09c:    34 00 80 82     lqd    $2,32($1)    # 20
0a0:    3e c0 00 84     cwd    $4,0($1)
0a4:    b0 40 81 84     shufb    $2,$3,$2,$4
0a8:    24 00 80 82     stqd    $2,32($1)    # 20
0ac:    34 00 80 82     lqd    $2,32($1)    # 20
0b0:    4c 00 01 02     cgti    $2,$2,0
0b4:    21 7f f1 02     brnz    $2,0x3c    # 3c
0b8:    1c 14 00 81     ai    $1,$1,80    # 50
0bc:    35 00 00 00     bi    $0

And this is the disassembly of the _start function:

0c0:    24 00 40 80     stqd    $0,16($1)
0c4:    24 fe 80 81     stqd    $1,-96($1)
0c8:    1c e8 00 81     ai    $1,$1,-96
0cc:    33 80 20 83     lqr    $3,0x1d0    # 1d0
0d0:    34 00 80 82     lqd    $2,32($1)    # 20
0d4:    3e e0 00 84     cdd    $4,0($1)
0d8:    b0 40 81 84     shufb    $2,$3,$2,$4
0dc:    24 00 80 82     stqd    $2,32($1)    # 20
0e0:    40 b0 99 03     il    $3,24882    # 6132
0e4:    34 00 80 82     lqd    $2,32($1)    # 20
0e8:    3e a2 00 84     chd    $4,8($1)
0ec:    b0 40 81 84     shufb    $2,$3,$2,$4
0f0:    24 00 80 82     stqd    $2,32($1)    # 20
0f4:    40 80 00 03     il    $3,0
0f8:    34 00 80 82     lqd    $2,32($1)    # 20
0fc:    3e 82 80 84     cbd    $4,10($1)
100:    b0 40 81 84     shufb    $2,$3,$2,$4
104:    24 00 80 82     stqd    $2,32($1)    # 20
108:    1c 0c 00 83     ai    $3,$1,48    # 30
10c:    1c 08 00 82     ai    $2,$1,32    # 20
110:    04 00 01 04     ori    $4,$2,0
114:    40 80 05 85     il    $5,11
118:    33 7f dd 00     brsl    $0,0x0
11c:    40 80 00 03     il    $3,0
120:    34 01 00 82     lqd    $2,64($1)    # 40
124:    3e c0 00 84     cwd    $4,0($1)
128:    b0 40 81 84     shufb    $2,$3,$2,$4
12c:    24 01 00 82     stqd    $2,64($1)    # 40
130:    40 80 00 03     il    $3,0
134:    34 01 40 82     lqd    $2,80($1)    # 50
138:    3e c0 00 84     cwd    $4,0($1)
13c:    b0 40 81 84     shufb    $2,$3,$2,$4
140:    24 01 40 82     stqd    $2,80($1)    # 50
144:    32 00 0a 80     br    0x198    # 198
148:    34 01 40 82     lqd    $2,80($1)    # 50
14c:    04 00 01 03     ori    $3,$2,0
150:    34 01 40 82     lqd    $2,80($1)    # 50
154:    78 a0 81 85     mpyh    $5,$3,$2
158:    78 a0 c1 04     mpyh    $4,$2,$3
15c:    79 80 81 82     mpyu    $2,$3,$2
160:    18 01 02 83     a    $3,$5,$4
164:    18 00 81 82     a    $2,$3,$2
168:    34 01 00 83     lqd    $3,64($1)    # 40
16c:    18 00 81 83     a    $3,$3,$2
170:    34 01 00 82     lqd    $2,64($1)    # 40
174:    3e c0 00 84     cwd    $4,0($1)
178:    b0 40 81 84     shufb    $2,$3,$2,$4
17c:    24 01 00 82     stqd    $2,64($1)    # 40
180:    34 01 40 82     lqd    $2,80($1)    # 50
184:    1c 00 41 03     ai    $3,$2,1
188:    34 01 40 82     lqd    $2,80($1)    # 50
18c:    3e c0 00 84     cwd    $4,0($1)
190:    b0 40 81 84     shufb    $2,$3,$2,$4
194:    24 01 40 82     stqd    $2,80($1)    # 50
198:    34 01 40 82     lqd    $2,80($1)    # 50
19c:    4c 02 41 02     cgti    $2,$2,9
1a0:    20 7f f5 02     brz    $2,0x148    # 148
1a4:    34 01 00 82     lqd    $2,64($1)    # 40
1a8:    1c 00 41 03     ai    $3,$2,1
1ac:    34 01 00 82     lqd    $2,64($1)    # 40
1b0:    3e c0 00 84     cwd    $4,0($1)
1b4:    b0 40 81 84     shufb    $2,$3,$2,$4
1b8:    24 01 00 82     stqd    $2,64($1)    # 40
1bc:    00 00 00 00     stop
1c0:    1c 18 00 81     ai    $1,$1,96    # 60
1c4:    34 00 40 80     lqd    $0,16($1)
1c8:    35 00 00 00     bi    $0
1cc:    00 20 00 00     lnop

You can see that the functions end with a bi $0 instruction, which means branch indirect to the address stored in the link register ($0). If a function contains calls to other functions the compiler will have to save the link register somewhere. This is done by storing the link register on the stack. If you look at the disassembly of the _start function, there is a stqd $0, 16($1). This is the instruction that does the saving ($1 is the stack pointer). If you look at the end of the function _start there is a lqd $0, 16($1), this instruction restores the link register from the stack. But if you have a look at the g function there is no saving of the link register, as this function does not call any other functions.

And this is the control flow graph generated from the compiled assembly (image generated with graphviz):

control flow graph

The left graph is the g function and the right one the _start function, it is now easy to identify the conditional jumps and even the loops. The edges have different meanings, NEXT means a simple translation to the next block, BRANCH is an unconditional branch and BRANCH_COND is a conditional one. For example block 000ac and block 0003c form a loop, because there exists an edge with the type NEXT from 0003c to 000ac and a conditional branch from 000ac to 0003c and 000b8. Note that any conditional branch produces two edges, one edge to the block if the branch is taken and one to the other if it is not taken.

So the next big step will be to automatically generate the big structures like if and loop from the control flow graph. But more of this later.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s