Wednesday, 4 July 2012

Time to clean those rects up.

My current solution to the rendering in WME is rather slow, capping out at around 30 fps on my i7, which means I get lower frame rates in Dirty Split than I do in Diablo III. The major difference between the two would of course be partially explained by Dirty Split being rendered entirely in software. But that alone shouldn't excuse a non-changing 2D scene not managing to get more than 30 fps.

Now there are a few places that the rendering can be improved: First of all, is there any reason to redraw the screen if NOTHING has changed? Probably not. Is there any reason to produce as many frames per second as possible, maxing the CPU-load? Probably not.

So, I put in a framerate cap, for the time being this is capped at 25 fps, to keep my CPU from maxing (and thus my laptop from getting quite hot). In practice it should probably go a bit higher than that, but since the rendering itself limits it's possibility at the moment, that discussion is rather moot.

To get the rendering to go a bit smoother, I had to detect if the new frame was at all different from the last frame, and preferably also WHAT was different from the previous frame, and then only redraw the parts that were changed. (The "dirty rectangles")

The old renderer

Originally, all surfaces that wanted to be rendered would apply a scale to themselves and then pass the scaled surface to the renderer (the surfaces would also cache the last scale, to avoid reapplying the scale every frame if it was only drawn at one size). Every frame would start off with clearing the screen-buffer, then drawing the surfaces one by one. Thus there was almost no difference between drawing a very different frame, or the same frame again, as all the surfaces would need to be redrawn anyhow.

Replacing this means I have to take care to keep the same behaviour intact, which means:

  • Any area that was drawn last frame, but isn't drawn now needs a redraw
  • Any area that doesn't get anything drawn in it, should get filled with the clear-colour (which was originally drawn into the screen-buffer on clear anyhow)
  • If any element was drawn before element X, it still needs to be redrawn before element X if it needs to be redrawn.
  • If any element was drawn after element X, it still needs to be redrawn before element X f it needs to be redrawn.


The name I chose for my solution is "renderTickets". Where before any draw-call would mean an immediate update to the screen-buffer, in the new system, a renderTicket makes note of the operation that was asked for (a ticket), as well as a copy of the data that was to be used as a source for the operation.

When adding a ticket, a search is done to see if the surface that asked to draw issued any tickets last frame, and if they are the same as the one's this frame. Any tickets that are unchanged will be reused, while any new tickets will trigger an update in their target screen region.

Additionally the renderer keeps track of the order of the tickets, a ticket is only accepted for reuse if it is asked for at the same point it was asked for last frame, with one exception: A new ticket that arrives as ticket N, will increment the expected order of the tickets that should arrive as N+1 and on, since a new ticket will get it's region redrawn completely anyhow, this should still keep the Z-order, but avoid having to redraw uneccesarrily areas that arrive in-order, but with new tickets before them. Any ticket
that triggers a redraw, has it's target-position added as a dirty rect (at this point, that means that the single rect used is scaled to include it's area)

Finally, when the engine asks for a flip from the back-buffer to the screen-buffer, the actual drawing starts:
  1. First, the list of render tickets is purged of all items that were drawn last frame, but did not receive requests for draw this frame.
  2. The dirty rect is filled with the clear-colour
  3. Any tickets that intersect the dirty rect get's that section redrawn
  4. Finally the back-buffer is copied to the screen buffer and onto the screen.


The current implementation has a few issues with Z-order, and uses a bit more memory than the old solution. The memory usage was expected, as every ticket has to keep a copy of the section it wants to draw (as the scale can differ between draws, or the Surface that asked for a draw can be destroyed before screen-flip). Thus the implementation isn't enabled by default at this point. 

There also is no code done yet for Fading/Line-drawing with dirty-rects, and I also might want to use multiple dirty rects, instead of scaling a single update-area. Happily though, the current solution allows the engine to idle without problem, which means that a screen that doesn't change doesn't even trigger a buffer-copy, and puts the CPU-usage down from 100% to less than 10% when nothing is happening.

Other developments

I did a quick test of the engine on my PPC-machine, and fixed an endian-assumption in the scripts, which means that I possibly might be the first person to ever start up J.U.L.I.A. and Dirty Split on a PPC Mac :D That did expose the need for a way more efficient render-solution though, because while the game did RUN fine, it wasn't fast enough to be very playable.

In some free moments the past week I have also done quite a bit of variable and function-renaming to move towards following the ScummVM convention (ScummVM uses "funcName()" and "varName", while WME uses "FuncName()" and "VarName"), thanks to _sev's earlier help, the "m_MemberVar" -> "_memberVar" rename is already done, although a few of those might need a bit of lookthrough to catch cases like _iD (which probably should either be _id or _ID).

Current engine status

While there are no fancy pictures this time around, I thought I'd list my estimates on the engine-status:
  • Graphics: 80-90% Works completely, but is very slow
  • Sound: 50-60% Works, but lacks volume-control and WAV-file support
  • Fonts: 70-80% Works fine, but is a bit darker than in the original, also no solution is in place for replacing system-fonts (that won't necessarily be available on non-Windows-platforms)
  • PPC-support: Can't guess at a number, but the engine starts, and seems to run OK
  • Video: Works, but has the same issues as Broken Sword 2.5 lists, video desyncs from audio, and is very slow
  • Savegames: 80-90% Currently broken by a mis-setting of version-numbers, but that should be a quick fix, otherwise works fine, and has no noticed memory-leaks/issues.
  • Renaming to ScummVM-convention: 40-50% would be my estimate, but I would guess at closer to 40 than 50.
  • Sprite mirroring: 100%

No comments:

Post a comment