On Decompilation (Of SPU Binaries)

First one needs to understand, that an ideal decompiler, that will recover the original source code, is impossible to create. The process of compilation is removing a lot of high level information, such as used data types or loop types. Then optimization could re-arrange code/data flow, eliminate common subexpressions, perform function inlining, etc. Therefore any output of our decompiler will most likely not match the original source code, but if we do it right, we can get a (very) close approximation.

So what is it, that I’m doing at the moment?

Currently the decompiler will need to be fed with an ELF binary, that has its section headers present and from them it will extract information about executable and data regions. For every executable region it will then extract subroutines and split every of them into basic blocks. A basic block is a piece of code, that only contains simple operations and ends with a branch. This basics blocks might get split again, if a branch to any of its operation occurs from inside the subroutine. This way we will collect a list of all the basic blocks, can link them according to their exit branches and end up with the control flow graphs of all the subroutines. For identification of high level statements like ifs and loops (see [REC] for loop detection) we compute DFS and dominator trees and for later use the frontiers of each block (see [DOM] for used algorithms). Next we perform register liveness analysis to identify subroutine arguments and return values (see [LIV] for used algorithm). Now we can translate all the assembler instructions into simplified operations, e.g. translate “ori rt, ra, 0” into “rt <- ra” and identify local stack variables in that process too, following IBM’s SPU ABI specifications (see [ABI]).

More additional steps are then required to produce high level code, as you can see in the following list (entries marked with (DONE) are finished and working already):

  • Extract text/data regions (DONE)
  • Extract subroutines (DONE)
  • Extract basic blocks (DONE)
  • Create control flow graphs (DONE)
  • Compute (reverse) DFS/dominator/frontier information (DONE)
  • Extract high level statements using (reverse) DFS/dominator trees (DONE)
  • Local and global liveness analysis to find subroutine arguments (DONE)
  • Extract and simplify operations from every basic block (DONE)
  • Transform operations into SSA form using frontier information (PARTIAL)
  • Perform constant/expression propagation and dead code elimination (PARTIAL)
  • Transform operations from SSA form
  • Build expression tree from optimized operations
  • Output high level (pseudo-)code from restored expressions

I’m currently looking into methods that don’t need SSA form to perform propagation, but if they don’t work (too well) I’ll have to stick with SSA. Here is the output of a little argument test program and here is the original source (R ~ register, S ~ stackvar, I ~ immediate).

After SPU decompilation is working, I plan to refactor the source code to make it more generic, so it’s easier to port it to other CPU architectures (e.g. PPC).

References:

Advertisements

5 thoughts on “On Decompilation (Of SPU Binaries)

  1. Interesting post, most of it went way over my head but i enjoyed reading it nonetheless. I wish you all the luck with your quest! Write more posts!

  2. The testprogram is a bit weird, since the local var j is always dead at the end of the subroutine. A sane compiler would do nothing for each subroutine, right?

    Otherwise, great work 🙂

  3. Great work so far, but some questions:
    Why don’t you want SSA?
    As for porting to other architectures, how do you plan to handle SPU channels and other mmio which SPU code has direct access to?

    • Thank you! To answer your questions: simply because SSA looked overcomplicated, but after I finished implementing it now, I decided to keep it, since it’s really working great 🙂 As for SPU channels, this will be moved into a processor specific fallback handler so that most of the work can be done in a generic way and only the specific parts will be done in per architecture code.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s