Hello everyone! 

My name is Daniel and I'm the graphics and physics programmer of the We Shall Wake project and the Insomnia engine. I'm here to ramble about the tech that the Insomnia engine uses to maintain playable frame rates and get better graphics to make it look like we know what we're doing. =P These posts are going to be pretty tech-intensive, but I hope some of you appreciate this kind of stuff anyway.

One of the most performance sensitive areas of graphics programming is getting your data the graphics card. Insomnia uploads a large amount of data each frame, and it's very important that this is done in a timely manner. This data includes position data for 3D models, and also a large amount of skeleton animation data that needs to be updated and streamed to the graphics card each frame.

Let's start with some history. The easiest way to upload data to the GPU is to simply pack up the data into a data buffer and tell the driver where the data is. This also happens to be the slowest way. This is because of the number of times the data has to be copied.
  1. The engine has to pack up the data.
  2. The engine tells the graphics driver where the data is. The driver creates its own copy of the data to ensure that the data doesn't get modified by the engine before the driver has a chance to upload it to the GPU.
  3. The driver sends the data to the graphics card's video RAM.
The data is essentially copied 3 times! What a waste!

A much more modern way of uploading data to the GPU is to map memory. This gets rid of one copy, but it also introduces some new problems. Buffer mapping essentially works like this: Instead of telling the driver where the data is, we ask the driver where it wants the data to be placed. The driver gives us a memory pointer which tells us where to place our data. We can then write directly to this place, eliminating point 2 in the above list. When we're done writing, we tell the driver we're done and it ships it off to the graphics card as usual. The problem with buffer mapping is that it requires a large amount of validation and error checking from the driver. It has to ensure that the memory pointer isn't in use already, and that the copy to the graphics card is already done. This can cause some nasty cases where the CPU has to wait for other operations to finish, which is called a "stall".

Just a few months ago, a new technique for uploading data to the graphics card was developed, and driver support for it has finally been implemented by AMD, Nvidia and Intel. The technique is called ”persistent buffer mapping” and is quite revolutionary. This allows the driver to instead of giving us a pointer to the driver's memory (which is still in normal RAM), the driver literally hands the engine a pointer directly to the graphics card's video memory. When we write some data, the driver guarantees that it'll immediately be send to the graphics card, without any additional copies. Even better, the driver now allows us to keep this magic memory pointer forever instead of having us ask for a new one each frame, so the expensive map operation is also gone. The data is therefore directly sent to the GPU without any unnecessary copies. The cost is that we, the game developers, have to take care of the validation that the driver has traditionally done for us, but this is a small price to pay. Since we know exactly how the data will be used, we can skip almost all the validation the driver used to do, so performance is much higher.

But enough theory! Let's see some numbers! The following is the test results from a very heavy scene, with hundreds of shadow-casting lights, a bumpy terrain and over 50 skeleton animated 3D models.

Traditional buffer mapping:
FPS: 46
Frame time: 22.8425 ms
Render time: 20.3548 ms

Persistent buffers:
FPS: 72
Frame time: 13.9157 ms
Render time: 5.5067 ms

The difference is HUGE. We see a large difference in raw FPS (a 56% increase), with the time taken to render a frame dropping from 23 ms to 14 ms. However, the gains are actually for more substantial than just a raw increase in frame rate. The time it took to submit all rendering commands dropped to almost 1/4th of what it originally was. We've essentially shifted the bottleneck from the game's rendering code to the driver. This is very beneficial to us, as this leaves a lot more CPU time for the physics engine and AI to play around with.

Sadly, not all graphics cards that we want to support can utilize this new method of uploading data to the GPU. Therefore the engine can easily fall back to traditional buffer mapping if persistent buffers are not supported. In addition, from the testing all you guys have done for us, we've determined that many of you have old drivers with a buggy implementation of persistent buffers, so the engine even tests them to make sure that they work properly before enabling them. Fighting driver bugs - just another day at the office (we don't have an office).

That's all for today! Thanks for reading!