Sunday, March 18, 2007

gpSP for GP2X

These days gpSP is becoming less and less tied to PSP specifically, in particular thanks to the efforts of zodttd. He originally ported gpSP to the GP2X some months ago, but it was a pretty straight forward port (mostly based on the framework for the PC version, which is fairly portable via SDL) only utilizing the portable interpreter for CPU emulation. Since then I had intended to work on a dynamic recompiler to bring performance up to speed, but real life got in the way (plus I haven't had a very working GP2X setup). Fortunately, zodttd managed to pull one off and things are looking pretty good these days. The first public version was made available a few days ago, and I hear that it runs a number of games very well. Of course, there's still room for improvement.

I haven't actually been able to test it yet, but I hope to have an Open2X setup for cygwin as soon as possible. I'll also need a means of powering the thing, either with an AC adapter (preferred, but far too overpriced), or battery recharging. Then, once I'm able to contact zodttd (something I should have done yesterday but managed to collapse and not get back up for a while..) we can go over means of cleaning it up and improving it. There are a few things I've been thinking about for dynarec improvements in general that should apply to both the GP2X and the PSP versions (and others if they surface).

Right now the dynarec will benefit the most from two things: native flag calculation and register allocation.

A quick crash course on the ARM CPU, the CPU that both the GBA and the GP2X have: there are 15 "general purpose" registers where temporary values may be freely loaded and stored. Arithmetic/logical operations may optionally set "flags" which reflect the results; this can be used to conditionally execute instructions. Indirectly speaking, the flags are usually used to test if a variable hit zero, or if one is greater/less than another, and so on.

Because the GBA and GP2X have the same CPU the flag generation behaves the same way, and you should be able to use the GP2X's CPU to generate the flag results in hardware, and also keep them stored in its flags register. There are two caveats to this:

- Sometimes you'll need to perform calculations that will modify flags that are not part of the GBA code but are necessary for emulating it, for instance emulating memory or calling external functions (IE, to update the screen) that will probably modify flags. When these cases occur the flags should be saved then restored to a temporary register or on the stack.

- The flags register actually holds other information for determining the current state of the CPU (if it's in ARM or Thumb mode, if interrupts are enabled, what mode the CPU is in, etc). Consequently, the state of the CPU that the GP2X is in will often differ from the state the GBA code should see. What this amounts to is reconstructing the flags register from the native GP2X one and an emualted one whenever it is read. Likewise, writes to this register must be handled with care (but that's a given). This is a minor issue, but ones like this make complete "virtualization" impossible; I'll talk more about this later.

Right now the dynarec is calling C functions to perform operations involving flags, which is fairly costly. This is especially so for "Thumb" code which performs flags modifications often. On GBA Thumb code is being executed a large percentage of the time (varies depending on the game, is typically 50% or more). In the future flags modifying or utilizing operations will instead be inlined to single ARM instructions.

The next big win will be in register allocation. When emulating a platform where you have significantly less CPU registers than the machine you're recompiling to you can use a complete allocation where every register on the emulated machine maps to a register on the native machine. For the PSP version of gpSP, this is exactly what happens. Unfortunately, in the case of GP2X you have exactly the same number of registers. We cannot perform a complete mapping because we need a few additional registers to perform the emulation, and retain the internal state of the GP2X.

There are two approaches that can be taken at this point: dynamic or static register allocation. Dynamic allocation will load registers into memory as necessary, then keep them in registers until another load forces them out, using a replacement strategy such as "least recently used." This is identical to how dynamic caches such as CPU cache and virtual memory work. Compilers will also use a register allocation scheme such as this, but with more effective (and considerably more computationally expensive) algorithms for allocating registers. This is the approach typically taken by recompilers in emulators, especially when there are far fewer registers on the machine you're emulating on, as is the case with MIPS->x86 (as seen in PS1/PS2 and N64 emulators for PC).

However, dynamic allocation isn't easy to use with a recompiler such as gpSP's because translated code blocks are "self re-entrant", meaning that they can branch to within themselves. This runs into a problem which applies to various areas of recompilation; anywhere you have multiple entry points in a code block, you have multiple conflicting states and can no longer carry on prior state. This can be rectified by flushing the registers back to memory at this point, but this adds a lot of overhead, particularly in code size.

Instead, I think it's worthwhile to opt for a global static allocation scheme. Here, a fixed allocation is chosen for the entire execution of the game; that means that certain GBA registers will always be in GP2X registers, and the others will always be in memory. This provides less efficient allocation so registers will probably be in memory more often, however, it means that registers do not need to be "spilled" back to memory inbetween blocks. It also sidesteps the above problems. Since we should be able to store about 67% or more of the GBA's registers in GP2X registers this should result in instruction operands being in GP2X registers at least 67% time. Fortunately registers do not tend to be used uniformly; those top 67% registers may in fact be used more often than the bottom 33%.

To figure out which registers to use some profiling on GBA games will have to be undertaken; this should be done by modifying the PC version's interpreter and running some games to generate percentage usage of registers.