Thursday, March 22, 2007

Status update

Finally seeing something on my GP2X, so I'll report it here.

I was originally going to tweak zodttd's arm_emit.h (the file that contains a bulk of the backend for the dynarec), but some parts were a bit obfuscated beyond what I wanted to work with, making it hard to reach some of the more critical areas I wanted to. Don't get me wrong, zodttd did a good job getting a working dynarec off the ground, but it was a bit different than what I was ready to work with.

So, since it's somewhat straightforward to do so, I decided to start over on this file from scratch. Really it works out as being a port of the x86 version, x86_emit.h. I wrote this version (the recompiler for x86) originally, and converted it to the PSP version as a base to start from.

This version is basically partially a "threaded interpreter", partially a true dynarec. The way threaded interpretation works is that instructions are compiled to calls to C functions; ideally you would then only have to emit actual machine code for loads to memory (for loading register operands), loading immediates into registers, calling functions, and storing to memory (for storing register results). However ARM has a lot of sophisticated functionality that means you'd have to either have a ton of functions to call or several functions called for some instructions to make things work. For that reason I have some things in functions, and some things inlined in assembly code, as follows:

Inlined portions: Barrel shifting, multiply/multiply accumulate instructions, direct branches, memory address calculation
Function calls: ALU operations, memory loads/stores, indirect branches

The inlined portions were originally expressed in a simple assembly language that's mostly a subset of x86, MIPS, and ARM (so it has two address operations, only a few registers, the ability to load/store registers from memory, etc). This meant that when porting the recompiler to other platforms away from x86, all I had to was rewrite the macros to generate the instructions of this subset language as instructions in the native language targeted.

There is however one thing that the subset language can do which the others can't, and that is to load large immediates. Actually, most of the time these are for compiling ARM ALU immediates, and since we're compiling to ARM this time I can retain the information. But I had to modify these primitives to take things in the ARM format (immediate + rotation) instead of just an immediate. That fortunately worked out to be very simple.

The eventual goal is to have the ALU operations inlined. Memory load/stores and indirect branches tend to be complex operations and for that reason don't get inlined, although there is some potential for inlining some memory operations down the road, depending on how things work out there.

So, right now I basically have the ARM translation kinda working; some instructions don't work yet. Thumb doesn't work at all yet (hopefully it'll be easy to resolve). I've just now started seeing results because I've been fighting with the basic GP2X development process this entire time.

This is basically a timeline of what happened:

Sunday night: Got zodttd's latest source building with DevkitGP2X
Monday: Poked around and decided I'd rewrite arm_emit.h. Didn't work on it much this day because I was very tired.
Tuesday: Mostly rewrote arm_emit.h (~1800 lines covered).
Wednesday: Finished rewriting arm_emit.h and modified arm_asm_stub.S so that things would compile. Unfortunately, here is where everything fell apart. After several instances of my binary mysteriously not updating, suddenly all my builds would segfault. I was trying to communicate with the GP2X more effectively but to no avail. And my internet connection was being lousy, so I went to sleep in a total haze, exhausted and going nowhere. Didn't get a lot done again because I was very tired >_>
Thursday: Woke up and managed to get an SSH connection to the GP2X (thanks to help from zodttd the night before). Found out that anything the compiler generated would segfault now; spent a lot of time trying to find the reason for this and got nowhere. Reinstalled the toolchain, got nowhere. Tried reinstalling another distribution of the toolchain, got nowhere. At this point I'm wondering if I've completely lost my mind. Finally decided to try installing Open2x for Cygwin like zodttd originally suggested and it magically works. With SSH working I was able to debug it for real and finally got to the point I'm at now after a few ASM dumps and actually clearing the cache in the right places (btw, some documentation on the cache syscalls would be great).

So that puts me in pretty good shape for the rest of the week. This is what I'd like to do now, roughly in this order:

- Fix the remaining ARM instructions.
- Fix Thumb (it might be just going into it that's breaking for whatever reason) and fix whatever Thumb instructions end up not working.
- Inline non flag-altering ALU instructions. This will put me to roughly where zodttd's is, although currently I do a few things better than his (immediates loaded in ARM form, branches linked directly w/o loading the PC)
- Inline flag-altering ALU instructions, caching flag results in single register.
- Write analysis backend for interpreter and write register analysis to determine most frequently used registers in ARM/Thumb mode. Will be worthwhile to put memory analysis here as well and other things perhaps..
- Run some tests on games for a few minutes each to gather analysis statistics.
- Perform register caching using whatever available registers I have and the statistics for most frequently accessed registers per mode. Being done statically this shouldn't be too bad to implement.
- Refactor mips_stub.S using C preprocessor macros. It's currently ~3500 lines and a mess to alter/maintain and especially port, so I hope to improve upon these things (here's hoping I can cut 1000+ lines from it).
- Port the memory handlers in mips_stub.S to ARM to replace the current ones in arm_asm_stub.S. This is both for performance and accuracy reasons (you'd be surprised what handling strange illegal memory accesses can fix in games)
- Gather motley crew of beta testers to hammer the thing.

That's the rough plan for now. I'm hoping that this won't take too long (well, I only have about 9 days as allotted by zodttd, maybe he'll give me an extension if I need it and have shown good progress though). It's my desire that this will give an improvement in performance that's reasonably noticeable, but unfortunately I can't promise that.. funny thing about results being nothing like you expected.

After I have these things out of the way I hope to go back to working on the core issues in GP2X, and other base optimizations to help all of the versions.

Course, I'd like to thank zodttd for working on things and giving me a hand, and also Sergey Chaban for the emitters from Vincent, which we're using to emit ARM code.

Let's see if tomorrow is productive at all..


starpause said...

sick man, keep hacking! i've done a bit of assembly and coding for embeded micros so i get the jist of what you're talking about, thanks for the fun read.

sehs33 said...

Exophase, thanks alot for all the time you are puttig in gpsp2x, you and zod are making the dream of many GP2Xers come true, keep on the great work man :)

Yod4z said...

An incredible work has you done and you will have to sleep too ;)

Keep on this good job and thanks for the GP2x community ;)

Peter said...

Thank you for all your hard work :)

vitriol said...

Thank you Exophase and Zodttd for the great work you have done with gpsp!


Brett said...

This is awesome exophase :) It is people like you that truly revitalize the gaming experience.


Ari said...

Awesome work, Exophase! Hope you get many donations from this, you deserve it!

Peter said...

Indeed, I believe Zodttds private betas may have made him a small fortune. Maybe atleast $200?

Nice to see this blog getting some exposure aswell.

Troy said...

Way to go! You are awesome man.