I rewrote the expand pass for alpha blending in ARM ASM, which should benefit the GP2X version and any future ARM versions (remember, there are a ton of prospective ARM devices out there). Even though my version is very straightforward and something that a compiler should have been able to generate it's still much better than what GCC generated with the optimization flags on now (I don't think that it could do any better with any others, but I don't know for sure).
The more actual blending going on onscreen the more of a speedup this version might give. Here are the numbers:
Castlevania: Aria of Sorrow
Full test : 6456 ms (21.522274 ms per frame)
No blending : 5379 ms (17.930580 ms per frame)
No video : 2241 ms (7.472070 ms per frame)
No CPU : 4392 ms (14.642917 ms per frame)
No CPU/video: 361 ms (1.204903 ms per frame)
CPU speed : 1880 ms (6.267167 ms per frame)
Video speed : 4215 ms (14.050203 ms per frame)
Alpha cost : 1077 ms (3.591693 ms per frame)
This one is the biggest winner. It has a ton of blending going on onscreen. The alpha cost has lowered by about 4.13ms over the C version - it's over twice as fast now.
Full test : 8843 ms (29.476669 ms per frame)
No blending : 8129 ms (27.098631 ms per frame)
No video : 4238 ms (14.129470 ms per frame)
No CPU : 3221 ms (10.737390 ms per frame)
No CPU/video: 505 ms (1.684263 ms per frame)
CPU speed : 3733 ms (12.445207 ms per frame)
Video speed : 4604 ms (15.347200 ms per frame)
Alpha cost : 713 ms (2.378040 ms per frame)
Here we see a smaller improvement, because the alpha cost wasn't that large to begin with. That's because alpha is only actually turned on for a small part of the screen. Last post I believed that brighten was used instead of alpha. This is what should have been used, but I think it's brightening by something a bit off-white. This effect is very subtle in-game, but if you turn it off (by making it choose the BOTTOM pixel, not the top) it becomes obvious it isn't there.
Still a win, and what's more, it shows that the difference between a small amount of blending and a large amount isn't that high.
Full test : 11854 ms (39.515438 ms per frame)
No blending : 10175 ms (33.918694 ms per frame)
No video : 6133 ms (20.443438 ms per frame)
No CPU : 2643 ms (8.810610 ms per frame)
No CPU/video: 1142 ms (3.809923 ms per frame)
CPU speed : 4990 ms (16.633512 ms per frame)
Video speed : 5721 ms (19.072001 ms per frame)
Alpha cost : 1679 ms (5.596743 ms per frame)
Alpha cost is down again, but this time the smallest percentage-wise. I think that there might be other things making it slow, like heavy usage of windows. This game in particular deserves extra attention. Again, alpha isn't used on very much of the screen - it mainly just provides the gradient effect.
Next time I'll talk about some changes I'd like to try for the C code (which will hopefully impact the PSP version as well).