Sure is a lot of posts today. With all the Java performance myths out of the way, let's take a look at the threading system of WSW.

TL;DR of this entire post: Insomnia achieves approximately 3.5x scaling on a quad core. It fails to reach 4.0x because the graphics driver's thread is slow and competing with the game's threads. Future versions of OpenGL (pretty much the same as DirectX but cross-platform) should allow us to reach 4.0x scaling.



So what's a thread? To simplify this a lot, threads are essentially what allows your (single-core) processor to run multiple programs at the same time. If a processor has 4 threads to run, it'll switch between them extremely fast, so from the programmer's perspective, it looks like all 4 threads are running at the same time, but 1/4th as fast. However, it's also useful to have a single program using multiple threads. This can allow the program to do heavy calculations in the background while keeping the user interface responsive.

At some point, hardware developers realized that increasing the clock speed of CPUs was starting to become unsustainable. CPUs were getting too hot and used too much power. What they realized was that it was much cheaper to drop the clock rate a little bit and instead have more cores in them. Doubling the clock rate essentially increases power usage (and therefore heat) by a factor of 8. This means that at the same power consumption, you can get

a single core processor at 1.00 GHz with 1.0x total performance.
a dual core processor at 0.80 GHz with 1.6x total performance.
a quad core processor at 0.63 GHz, with 2,52x total performance.
an octa core processor at 0.5 GHz with 4x total performance.

These 4 processors all use the same amount of power, but efficiency increases massively as more cores are added.

So now, all of a sudden our quad core processors can take those 4 threads our original single-core processor had and actually run all 4 of them at the same time at full speed. It doesn't matter that each core is slower; running 4 of them at more than half speed is still 2.5x faster. This is the theory behind it of course. It's worth noting that in practice, the CPU cores actually share a lot of their resources (particularly the RAM and memory controller), so you're not gonna see a perfect 4x performance boost from utilizing all 4 cores. At best, you might see a 3.5-3.9x increase in performance.



The problem today is that games aren't good at using the resources they have available. Having more cores doesn't mean anything unless you have threads to run on them. Even today, many years after the introduction of multi-core CPUs, most games still don't utilize more than 1 or 2 cores (*cough* Planetside 2 *cough*), but some games do show that it's doable (the recent Battlefield games for example). Insomnia's not going to lose when it comes to threading.

Insomnia's thread system is based on splitting up the game's rendering and logic code into distinct tasks. These tasks are organized similar to a flow chart, with certain tasks requiring other tasks to be completed before they're executed. These tasks are then put into a queue, and a number of threads can be created that runs  these tasks one by one from the queue.

Insomnia directly or indirectly uses a large number of threads.

 - N generic worker threads, where N is the number of cores the CPU has.
 - 1 main thread, which is the only thread allowed to communicate with the graphics driver.
 - The graphics driver has its own internal thread which is beyond Insomnia's control. Insomnia's main thread offloads work to this thread so that the main thread can work on AI and other stuff.

For the graphics and physics code, almost everything can be run on any number of cores. The only tasks that cannot be run on multiple threads are the tasks that require communication with the graphics card. Almost all of these are just small high-fives with the driver to ensure that everything's still correct, but some are pretty large. This is where the graphics driver's thread comes in and splits the work with the main thread automatically. It took a lot of work to avoid stepping on the driver's thread's toes, but I've managed to let the driver thread work completely undisturbed. It's not perfect (as will be evident later), but I'm not sure it's possible to improve this with the current version of OpenGL.

Here's a rather large flow-chart-like visualization of the tasks that the rendering code is split up into. Tasks marked with red are tasks that require communication with the graphics thread, so they must be run on the main thread.

How much does this improve performance though? If I run this on a quad core, do I see 4 times higher FPS? Almost.

Here are some of the results I get on my Intel i7-4770K quad core CPU:

 - The rendering code achieves 3.64x scaling.
 - The physics code achieves a 3.19x scaling.
 - The actual increase in frame rate is only 2.82x (which is still a 182% increase).

I blame this on the driver's internal thread, which competes for CPU time with Insomnia's threads. This is evident by the fact that the engine spends around 1/3rd of its time waiting for the server thread to finish its work. The next generation of OpenGL (pretty much the same as DirectX but cross-platform) should remove the restriction of the red tasks and also remove the internal driver thread, which would allow us to improve this scaling even further, but until then, this is about as good as it gets.

Holy shit, someone read all the way down here. Uh, not sure what to say... Hi, mum?

Hello again, everyone! Double post this week! I thought I'd rant a bit about our choice of Java.



As some of you know, we're using Java to develop WSW and Insomnia. No, we're not using Unreal Engine, but thanks for the compliment. ^^ Now, a lot of people are skeptical to our choice of programming language. Java doesn't exactly have a flawless reputation when it comes to performance (and security, although that's only applies to the Java browser plugin, which is not required in any way for Insomnia), but I thought I'd kill the two most common misconceptions about Java here.


a + b is equally fast in Java and C++.

Any basic arithmetic operation is equally fast in Java and C++. The Java Virtual Machine (JVM) compiles those instructions to exactly the same assembly code as C++ is compiled to in the end, although Java requires a few seconds after starting up for all the code to be compiled for optimal performance when the game is first started. There are some special instructions that can be used from C++ that can improve performance in some math intensive areas (for example matrix math). In our case, we actually take advantage of some of these by using math libraries that have native C++ code for the most performance heavy places like skeleton animation, so again, our performance with Java is in the 90+% of C++ here.


Java's garbage collection is not a problem.

Many games written in Java have problem with performance and stuttering due to the Java garbage collector, which automatically frees memory that is no longer in use. An automatic collection pass can suddenly trigger and interfere with the game's smoothness. There are three reasons why this is not a problem.
First, garbage collection only happens if you're actually generating garbage. It's not hard to make a completely garbage free game loop that allocates all its resources once and then reuses them indefinitely, and this is what we're aiming for.
Secondly, the garbage collection passes are fast and run in parallel with the game mostly, so the actual time that the game is paused for a collection is in the range of a few milliseconds, which the CPU easily handles without dropping a single frame in almost all cases. The stuttering we get from garbage collection is 1/10th as frequent and intensive as the stuttering we get from deep within the graphics driver, far out of any game developers control.
Thirdly, that allocating and freeing memory is slower in Java than C++ is a myth in the first place. The fact that the memory management is completely left to the JVM is actually an advantage as it can avoid fragmenting the heap, which is a common problem for C++ programs that degrades performance over time. Another massive advantage of garbage collection is that it's a lot easier to work with for us developers, so we can spend more time on new features and optimizing our algorithms instead of figuring where that memory leak that causes the game to crash and burn after 30 minutes of playing is.



So where is Java actually slower then? The biggest loss of performance in Java compared to C++ comes from the memory layout. In C++ you can use a number of techniques to force memory locality so that memory that is often used together lies in a continuous place in RAM. This makes the program more cache-friendly, as the CPU always loads in memory in relatively large blocks so it'd "accidentally" load in and cache all the required information when the first piece of memory is accessed. In Java, we have no way of forcing this as the placement in memory is left to the JVM, and the JVM may even reorder things later (again, this has other advantages). If you're aware of all this, it's not that difficult to minimize the impact of this. In addition, many Intel CPUs have hardware that pretty much eliminates this difference, which I'll go into detail about in my next post.


Hey, guys.

Due to my lack of time, this week was limited to small optimizations here and there, mostly triggered by the release of a 166 page PowerPoint presentation by the developers of Call of Duty: Advanced Warfare, which focused on the improvements they've made to the post processing of CoD, which I have to admit are pretty damn impressive.



Motion blur has received a much improved brand new next-gen algorithm based on the one used in Call of Duty: Advanced Warfare. Objects in motion are blurred much more accurately with more correct blur weighting, eliminating the sharp edge that was visible at times. The new algorithm is also around 94% faster. I fixed a number of visual artifacts introduced by the new algorithm, the most glaring problem being a clearly visible line that appeared along edges when combined with TSRAA. In addition, I added an optimization that checks what parts of the screen that actually need the complete motion blur algorithm that can handle difficult overlapping motion and sharp edges in front of objects in motion etc. The majority of the scene does not usually need this, so for these parts a simpler algorithm is used instead. This resulted in yet another 88% performance increase. Compared to the old motion blur algorithm, the new one is approximately 275% faster (!!!) during fullscreen motion, meaning that motion blur no longer cuts your FPS by a large amount when the camera starts moving.

The TSRAA shader also got some love this week. I identified some rather simple bottlenecks that especially slowed down the shader when using a high number of samples as they tricked the compiler into generating very inefficient code, and reworked those parts. This resulted in a massive 132% performance boost for 8xTSRAA, while 4xTSRAA saw a much smaller 10% boost.

Another post processing effect that the CoD slides mentioned was bloom. Thanks to some tips and tricks there, I managed to halve the VRAM usage and increase performance by 13% with just a few simple changes. In addition, as I was looking through the bloom shaders, I noticed a typo which was accidentally making the bloom flicker more than it should. I fixed that and also added more anti-flickering counter measures.

On the CPU side, I worked together with Brayden in an attempt to improve performance of the game logic. We realized that we were doing some redundant updating in the main logic loop, which turned out to account for around 40% of the time it took to run each update. This change will mostly affect slower computers that are limited by their CPUs when there are a significant number of AI enemies around, but in those cases it can increase your frame rate by over 60%.

Finally, shadow filtering was also mentioned in the CoD slides. Although their shadow filtering was not faster or better looking, it was more flexible, so instead of an on-off switch for shadow filtering, we now have a off-low-medium-high quality setting.



That's all for now. As you can see, these cumulative optimizations actually had a surprisingly large impact on overall performance. The reduced performance requirements of some of our more advanced graphics effects improves performance a lot for high-end graphics cards while also making it possible for weaker hardware to enable them, while our CPU optimizations mostly reduce the minimum CPU requirement to get smooth frame rates.

Hello. I spent this last week working on some new particle effect systems, along with my usual work on NPC and AI interactions. I also did some UI work, but that's so minimal it's not really worth mentioning.

A large part of action games lie in their particle effect systems. They're meant to give you a visual representation of each attack, to help it be more unique and give it its own identity. Since we're a small team, I can't really afford to give each attack its very own graphical effect, but I can make a large array of effects and give them size and color differences that can make them seem like they're unique.

Funnily enough, for the last six demos, I've used our blood particle effect for literally every effect in the game. Our dust, thrusters, explosions, actual blood effects, and sparks were all made with the blood texture. This is due to me not having enough time to actually implement the use of our other effects we have in the textures folder.

However, we've recently picked up an effects artist who can provide us with an array of effects that are also animated, which has provided us a pretty cool result so far.


So basically, I spent this week writing some custom animation systems to layer over Daniel's implemenation. This includes an "AnimatedGeoSystem" and an "AnimatedParticleSystem," which extend the particle and geo systems already in place. These act as a really nice framework for me to layer on a bunch of other effects that can work somewhat the same.

AI work mostly consists of me bug-fixing as per usual. It continually impresses me with stuff I didn't expect it to do, such as AI walking and interacting with dead bodies as if to inspect the area, or the pack system I mentioned in the last post. Most of this is a result of its relatively open (yet simple) decision making algorithms, which apply to a large number of activities. So now it's just a matter of me providing a large amount of animations for the AI to have at its disposal to make it even more lifelike.

That's all for this week, have a good one!

Hey, everyone.

Apologies for the late post. It's the last week before my exams, so sadly I haven't had much time to work on Insomnia. Most of the time I did spend on Insomnia went into bug fixing the new anti-aliasing and other parts of the engine.

I've modified how the engine stores motion vectors. Before, the engine stored the motion vectors as normalized values, which means they were stored as a percentage of the screen resolution, so a movement of 1 pixel to the right at 1920x1080 was stored as 1/1920. Now it stores them in actual pixels. This has a number of advantages. First of all, the precision does not depend on resolution, which caused problems at extremely high resolutions. At 7680x1440 (3 monitors >__>) I noticed a significant degradation in motion blur quality because of that. Secondly, it's actually very slightly faster, as the motion vectors were converted to pixel values in the end anyway, so this saves a few operations later.

TSRAA got a few bug fixes and improvements as well. A few temporal reprojection (=estimating where the triangle was in the previous frame using the previously mentioned motion vectors) bugs that reduced quality in motion have been fixed. I've also added a new ghosting prevention system which disables the temporal component for parts of the scene that are in fast motion. This turned out to actually provide a significant performance improvement when the scene is in motion as I skip quite a bit of work there. When paired with motion blur, any remaining aliasing is handled by the motion blur, so this shifts some freed up resources from unnoticeable anti-aliasing over to motion blur which only activates during fast motion. A win-win, as they say.

In addition, we've been getting some quite sneaky problems when utilizing multiple CPU cores in the engine, and I've been doing quite a bit of investigating into what's causing it. I still haven't figured out exactly what the cause is, but I have rewritten and simplified the threading system to reduce the chance of bugs and hopefully fix the problem. So far the bug has yet to appear again, but these kinds of bugs are hard to debug, as they happen randomly due to the timings and distribution of tasks over CPU cores, which is up to the OS to decide. Nothing we can't handle though. =3

That's all for this week. Next week I thought I'd write a more in-depth rant about how Insomnia's threading system works and how it allows the engine to utilize any number of CPU cores for almost linear scaling of the parts of the game that Insomnia takes care of.

Brayden here. This week I'm going to talk a little about refining code.

Basically, what generally takes us the longest when working on the game is refining code. We add features every other week generally, and then spend the following tweaking and fixing everything we made the week before. That's why it's so hard to keep these blog posts frequent, is because we often only get to add new cool stuff worth mentioning every other week.

This week, I refined primarily the AI. I did work on the Input systems some more and did fixes all around the engine, and I'm quite proud of the result. The AI now works in a somewhat pack-like state, which is kind of interesting because I didn't intend this. For example, in what I'm calling the "warzone floor," which is the second floor of the game and part of the tutorial, I noticed that the rebels would fight off guardians and then move forward in a pack to help out there friends who were being pushed back above. Traveling with them and helping turn the tides of the short battle was pretty fun for me, because it felt like I was really part of a transgression.


Ricky has been doing his thing with the texturing, in fact, Novem's new textures are done. I just need Forrest to set up his rig with the new UV coordinates and it'll be in the game. These textures not only add detail, but also establish an artstyle we're going to start aiming for. The model itself received fixes from Ivan on the normals. Although, no screenshots just yet, we're saving the new artstyle for the upcoming 7th playable demo and video. 

Rafael is also working on his new tracks for the game to go with the tutorial, and he hooked us up with some new sound designers who are going to be doing some cool stuff soon as well. Which, by the way, Rafael did some concept tracks a while back for We Shall Wake, so if you're interested in hearing what it's going to be like, check it out!


Anyway, that's for today! Take care guys, have a great week.

Hey, guys.

This week saw a lot of progress, with much of my time spent on the new anti-aliasing technique that will be featured in Insomnia, but first I'd like to mention a few minor things.



SSAO has been improved even further to smooth out the normal of the surface being tested, so the intense shimmering and aliasing that occurred in some rare cases have been mostly eliminated. This had a minor performance hit, but the quality improvement allows me to cut back on other parts of SSAO, so performance remains the same.

The UI renderer got a complete rewrite and is now faster and better than ever. It turns out that some of the special effects that we applied to the UI did not result in any visible visual change. They also turned out to be surprisingly expensive on weak hardware (God damn it, Intel; you can't spend 4 of my precious 16 milliseconds on the user interface!) The revamp both makes these special effects look better AND allows them to be turned off as a desperate last resort to save some GPU power.

Other parts of the engine got some minor optimizations as well, but nothing near as significant as last week. For example, particle rendering performance was improved a little bit.

As a result of our optimization push, we've managed to optimize the game for low-end hardware to an almost ridiculous point. Insomnia used to run at under 10 FPS on a weak Intel GPU, but thanks to our optimization efforts, we're now getting almost 40 FPS on that same GPU, meaning that as long as the game starts (e.g. the GPU supports OpenGL 3.2), you should be able to tweak it to playable FPS.



With that out of the way, let's get down to what this post will mainly be about: Anti-aliasing! Be prepared that this is a pretty long text, so if you're not ready for a deep dive you may want to just skip to the middle where there are some screenshot links.

Anti-aliasing is almost always featured in games in some form. I suspect that if you find my blog posts interesting enough for you to read this far, you already know what anti-aliasing is and does, but for the sake of it I'd like to give a brief explanation of what anti-aliasing is.

Anti-aliasing techniques are techniques that attempt to minimize the aliasing in the scene. Aliasing occurs when there are details or movement that are less than a pixel in size. A 3D model is made out lots of mathematical triangles described by 3 points in 3D space. We project this 3D model to the screen and check which pixels that the triangle overlaps. However, we do not have an infinite number of pixels, so the result not entirely accurate. If a thin triangle happens to fall between the pixels, it'll completely disappear, and if a triangle moves less than a pixel, the pixels it covers may change in weird ways patterns that simply put looks unnatural and unpleasant, and our brains interpret these unnatural pattenrs as flickering, shimmering and even objects changing shape.

There are two approaches to anti-aliasing today. The first one is to simply generate and gather more information about the triangles. This is what the famous MSAA, also called multisampling, does. Usually the graphics card simply tests if a pixel's center lies inside the triangle to determine whether the pixel is covered or not, but to better describe the shape of the triangles we can test multiple points inside the pixel. For 4x MSAA, we get 4 times as much information to work with, so we can get much higher quality result.

The second one is the filtering approach. This involves looking at the information we already have and attempting to make the most out of it. FXAA, another famous anti-aliasing technique, falls into this category. FXAA attempts to analyze the final image and reduce the aliasing in it by detecting edges and smoothing them out. It's often called inferior to MSAA, and for good reasons. It's cannot restore triangles that never covered a single pixel center, as its limited to the information at hand.

Another filtering technique called Temporal Supersampling combines the current frame with the previous frame. This essentially grants us more information to work with for free, but introduces a lot of problems, like ghosting or blurry textures. Regardless, techniques that use the previous frame have been used in many successful commercial games, like Crysis 2. It's also used in Nvidia's new anti-aliasing technique, MFAA, but I digress.

Up until now, Insomnia has only supported FXAA, and for good reasons. FXAA may have low quality, but it's cheap and extremely easy to implement for any engine. MSAA on the other hand is quite the opposite. MSAA requires modifying approximately 60% of the graphics engine to implement. This is not something that your average indie developer can afford the time to do. We'd essentially be having TWO engines to maintain at that point, which means adding more features becomes even more time consuming. In addition, MSAA has problems when working with high-contrast edges, effectively causing bright edges to bleed over less brighter edges, which reintroduces aliasing again! These are the reason why Insomnia doesn't, and won't, support MSAA.

I believe that the answer lies in hybrid solutions that both generate additional data and do proper filtering of it. Therefore, Insomnia features a new anti-aliasing technique developed by yours truly which combines all three of the above described techniques. I call it TSRAA, Temporal Subpixel Reconstruction Anti-Aliasing, and as of now it is officially part of the Insomnia source code.



Screenshots
http://screenshotcomparison.com/comparison/96073
http://screenshotcomparison.com/comparison/96074
http://screenshotcomparison.com/comparison/96075

Note: Still images are not enough to fully appreciate the benefits of TSRAA. What's REALLY going to blow your mind is how smooth it looks in motion.



What TSRAA builds on is that these three techniques compliment each other very well. Thanks to this, the quality of 4xTSRAA in almost all cases is higher than that of 4xMSAA, and in the rare case where there is clashing information in a pixel the quality drops to ghosting-free temporal supersampling only (which also could be argued to provide higher quality than 4xMSAA as well). TSRAA also implements a counter-measure for high-contrast edges (also an innovation from me =p), ensuring smooth gradients at all times. Here's a comparison!

Performance and memory usage are both excellent. Working with the data you already have is always fast, and the extra data we do generate is both cheap to generate and requires little memory. 4xTSRAA has around half the performance impact and memory usage of 4xMSAA. Even 8xTSRAA is faster and uses less memory than 4xMSAA, and less than 1/3rd of the memory used by (shiver) 8xMSAA.

Finally, TSRAA was a lot less complex to integrate into our existing engine. While MSAA essentially requires complete rewrites of large parts of the engine, TSRAA integrates and builds on top of our existing code in a much cleaner way.



So there you have it! Congratulations for making it this far! I know that this got a bit long, but I'm a bit of an anti-aliasing fanatic, and it's really exciting for me to actually get the chance to come up with and even implement all this. This is the culmination of over two months of work! Thank you for your time!


PS:
[04:42:25] Mokyu (TheAgentD): Can I write "buttload" on the blog?
[04:42:29] Mokyu (TheAgentD): Will you get mad at me?
[04:42:44] Brayden: no lol
[04:42:46] Brayden: why would I?
[04:42:51] Mokyu (TheAgentD): BUTTLOAD IT IS
[04:42:52] Brayden: Also why did you put buttload in quotes
[04:43:01] Brayden: I'm sitting here giggling like an idiot lol


We Shall Wake's first twenty minutes of gameplay are nearly complete. The tutorial, second floor, and the first encounter with Decem have all been coded.

Along with this, I spent the last week doing major optimization and revision work on the combat engine. Not only have I made the Input systems more responsive, I've also tried to give the combat more weight in general with more hit reactions and flinch animations - along with less floaty knockbacks.

Specifically, I've removed tapping from the combo systems and replaced it with a "press to activate" system. This is more traditional and has proven to be more responsive than what we original had.

For hit reactions, I've added a dynamic flinching system that blends in with enemies current animation. That means if you shoot an enemy while they're walking towards you, their chest will flinch back as they're walking. However, if they're stunned, a more traditional flinch animation will interrupt whatever they were doing. This allows for more hit feedback as to the current condition of your enemy.

I've reduced the slideback from basic hits so that combat will be less slippery, and more stationary. However it's still possible to move across the map at high speeds during combat, as the combat AI has been increased exponentially. In fact, when I first coded Decem, Daniel couldn't tell which MORS was the player (so, technically I passed the turing test! Ha).

We also improved our pathfinding systems to be more robust. It's a 3d pathfinder instead of a 2d one now, so AI react appropriately to ceiling and ledge heights. This solves the old problem of AI jumping into ceilings and jumping up on 30 foot ledges with no strain.

Once I lay the finishing touches on your first encounter with Decem, I'll turn my focus back to the general gameplay and dungeon generation systems. Demo 7 will be released when this is done and we've added in new graphics to hopefully make things a bit more colorful and interesting - and while we're leaking modelers as usual, the other artists are hard at work to make some cool concept work and music. We're lucky and thankful to have them - thank you Ricky, Anthony, Ivan, and Rafael!

We've gotten a lot of questions in terms of kickstarters and donation boxes, and we're going to try and work something out that works conveniently for us and you so that you can contribute cheaply without hassle. But we'll come back to this later - if you have any suggestions yourself, feel free to email us. We love to hear feedback from people interested in our game regardless, so even if you want to just say hi, feel free.

Our We Shall Wake feedback email address:
contact@weshallwake.com

That's all for this week. I'll see you guys next Saturday!

Hey, everyone!

First of all, sorry for missing the deadline on my first post. x___x Let's hope uni is less intensive in the future...


This week saw the addition of a number of new features. I thought I'd write about the two most visual ones.

The first feature is procedurally generated moss applied to the terrain. It can be used to essentially override certain parts of the terrain's textures with a moss texture. Even cooler, the moss changes the light properties of the terrain, so moss growing on metal will reflect light as moss should, not as the underlying metal.


With proper lighting applied, it all looks decent, but when there is no direct lighting and only ambient lighting is applied, the result is really unpleasant.


Every single detail in the terrain melts together to one incomprehensible mess. I can't even see where the floor ends and the wall starts.

The solution was the (re)implementation of SSAO into the engine. SSAO is short for ScreenSpace Ambient Occlusion. It's a technique first introduced in Crysis 1 that essentially analyzes the scene and dims pixels that are deemed to be partially occluded by surrounding pixels. Not only does this look pleasant, it also massively increases the depth perception of the scene and makes it easier to understand the shape of the objects in it, as it more closely mimics how the scene would be shaded in real life.


Notice especially how the details on the floor and the walls are much easier to see. I also spent quite some time optimizing the shaders and packing the data together more efficiently to improve performance to less than 1ms at 1080p at high quality. The SSAO can also be tweaked to run on significantly weaker hardware than mine at reduced quality.



In addition, for the last month or so we've been doing extensive optimizations of nearly all parts of the engine.

We identified a massive GPU bottleneck when there were lots of AI entities on the screen. Even though they were barely visible, they were made out of tens of thousands of triangles, which slowed things down immensely. To prevent this, I recently added a LOD (Level Of Detail) system to the model renderer, allowing us to switch out high-quality models for lower-quality models once they're at a sufficient distance from the camera that the difference is negligible. Our modelers will be providing multiple versions of each models with different triangle counts, that Brayden in turn will incorporate into the game through the LOD system I made, so this was essentially something that everyone in the team helped accomplish.

On my end, I've also done a number of optimizations specifically targeting low-end hardware. I've gone through almost all shaders the game uses and optimized them as much as possible. The post-processing pipeline has been restructured to be faster, more modular and enable new features in the future. Many features (for example distortions and shadows) that were permanently enabled before can now be toggled on and off to cram out some more frames per second on the slowest of hardware out there.

Here are some examples of performance wins you'll see in the next version.
 - LOD system: 200-300% increase in FPS with 100s of AI on the screen at lower graphics settings.
 - Motion blur: Approximately 10-20% faster depending on hardware.
 - Postprocessing: Several stages have been merged to a single stage, providing a small 5-10% boost.
 - Transparency effects: Small optimizations, plus quality improvements to particle motion blur.
 - Rendering: Reduced VRAM bandwidth usage by 15-30% depending on settings.
 - Lighting: Reduced VRAM bandwidth usage by 15%.
 - VRAM usage: Reduced by approximately 10-15%.



That's all for this time! See you again next week!








Hey!

We're going to be trying something a little different with the blog now that things are picking back up. From now on, Daniel will be posting on Wednesdays, whereas I'll be posting on Saturdays as per usual! So now you can expect updates not only on the game front, but also on the engine front.

So,
Saturdays = Updates on We Shall Wake
Wednesdays = Updates on Insomnia

More for you to read, and more incentive for us to not be lazy and to actually get things done!

Now that the formalities are out of the way, I'd like to talk about what I've done this week now that Daniel has gone and asserted his programming dominance. Funnily enough, we joke about having a "programming rivalry" so that we'll both stay at the top of our A-Game - so these blog posts will be taking it to a whole new level. Ha!

Anyhow, this week I've done mostly diplomatic director stuff. Our artist Ricky is doing some new "artstyle" experiments with MORS, which was extremely successful:


I've been doing AI and combat work - along with my usual bugfixes. We Shall Wake was designed from the beginning to have a huge dynamic world, with equally dynamic AI to inhabit it - so I've been making placeholder models and the necessary programming to place them in the world and have all of these aspects interact with each other.


We also have a lot of factions to consider as well, such as the miners, rebels, guardians and the relationship between the MORS brothers themselves. Each faction has its own characteristics that impact how the world is built and what these interactions are - for example, miners are not nomads, so the dungeon generator builds a house for each Miner present at the very first generation of the world. However, when these miners are killed - they never respawn, unlike their houses.

What exactly do these interactions consist of, you may ask? Some of them are more mechanical, such as how the pathfinder decides how to build its paths based on what static entities are where - along with the general topography of the land. Others can be more visible, such as AI's of the same faction walking around together, talking to each other, hanging out at campsites, or visiting the graves of dead allies and friends.

Interactions are also influenced by the personality of each individual AI - as each one is given a unique DNA set that is saved permanently. This impacts small things such as an AI being impatient and running to his destination rather than walking - to whether or not a Guardian will challenge you to a battle or ignore you if you're hurt and not fit for a true duel. DNA also has values such as their stubbornness, skill at combat, social tendencies, honor, pride, bravery - and many more - which all come into play at some point in their logic systems.



Many times you can transverse the floors of Yarib, and return and see familiar faces and personalities - and that will be because they are infact the same AIs. However, a lot of the time you won't see these faces, as while you are gone, they may be killed or murdered in the midst of a transgression between the guardians and rebels.

These battles and deaths are not random either - each floor operates while you are gone, so if you were to destroy all of the guardians on the floor, and the floors bordering it, there would be no battles while you are away because its protected by its neighbors. This is what We Shall Wake wants to achieve, a world that you, as the legendary MORS 09, can change and impact.

That's all from me for this week, take care!

P.S. On an unrelated note, Siliconera wrote two articles on us, check them out!

http://www.siliconera.com/2014/09/09/shall-wake-hyper-speed-combat-metal-gear-rising/
http://www.siliconera.com/2014/09/17/shall-wakes-combat-fast-physics-engines-cant-keep/

Hello everyone! 

My name is Daniel and I'm the graphics and physics programmer of the We Shall Wake project and the Insomnia engine. I'm here to ramble about the tech that the Insomnia engine uses to maintain playable frame rates and get better graphics to make it look like we know what we're doing. =P These posts are going to be pretty tech-intensive, but I hope some of you appreciate this kind of stuff anyway.

One of the most performance sensitive areas of graphics programming is getting your data the graphics card. Insomnia uploads a large amount of data each frame, and it's very important that this is done in a timely manner. This data includes position data for 3D models, and also a large amount of skeleton animation data that needs to be updated and streamed to the graphics card each frame.

Let's start with some history. The easiest way to upload data to the GPU is to simply pack up the data into a data buffer and tell the driver where the data is. This also happens to be the slowest way. This is because of the number of times the data has to be copied.
  1. The engine has to pack up the data.
  2. The engine tells the graphics driver where the data is. The driver creates its own copy of the data to ensure that the data doesn't get modified by the engine before the driver has a chance to upload it to the GPU.
  3. The driver sends the data to the graphics card's video RAM.
The data is essentially copied 3 times! What a waste!

A much more modern way of uploading data to the GPU is to map memory. This gets rid of one copy, but it also introduces some new problems. Buffer mapping essentially works like this: Instead of telling the driver where the data is, we ask the driver where it wants the data to be placed. The driver gives us a memory pointer which tells us where to place our data. We can then write directly to this place, eliminating point 2 in the above list. When we're done writing, we tell the driver we're done and it ships it off to the graphics card as usual. The problem with buffer mapping is that it requires a large amount of validation and error checking from the driver. It has to ensure that the memory pointer isn't in use already, and that the copy to the graphics card is already done. This can cause some nasty cases where the CPU has to wait for other operations to finish, which is called a "stall".

Just a few months ago, a new technique for uploading data to the graphics card was developed, and driver support for it has finally been implemented by AMD, Nvidia and Intel. The technique is called ”persistent buffer mapping” and is quite revolutionary. This allows the driver to instead of giving us a pointer to the driver's memory (which is still in normal RAM), the driver literally hands the engine a pointer directly to the graphics card's video memory. When we write some data, the driver guarantees that it'll immediately be send to the graphics card, without any additional copies. Even better, the driver now allows us to keep this magic memory pointer forever instead of having us ask for a new one each frame, so the expensive map operation is also gone. The data is therefore directly sent to the GPU without any unnecessary copies. The cost is that we, the game developers, have to take care of the validation that the driver has traditionally done for us, but this is a small price to pay. Since we know exactly how the data will be used, we can skip almost all the validation the driver used to do, so performance is much higher.

But enough theory! Let's see some numbers! The following is the test results from a very heavy scene, with hundreds of shadow-casting lights, a bumpy terrain and over 50 skeleton animated 3D models.

Traditional buffer mapping:
FPS: 46
Frame time: 22.8425 ms
Render time: 20.3548 ms

Persistent buffers:
FPS: 72
Frame time: 13.9157 ms
Render time: 5.5067 ms


The difference is HUGE. We see a large difference in raw FPS (a 56% increase), with the time taken to render a frame dropping from 23 ms to 14 ms. However, the gains are actually for more substantial than just a raw increase in frame rate. The time it took to submit all rendering commands dropped to almost 1/4th of what it originally was. We've essentially shifted the bottleneck from the game's rendering code to the driver. This is very beneficial to us, as this leaves a lot more CPU time for the physics engine and AI to play around with.

Sadly, not all graphics cards that we want to support can utilize this new method of uploading data to the GPU. Therefore the engine can easily fall back to traditional buffer mapping if persistent buffers are not supported. In addition, from the testing all you guys have done for us, we've determined that many of you have old drivers with a buggy implementation of persistent buffers, so the engine even tests them to make sure that they work properly before enabling them. Fighting driver bugs - just another day at the office (we don't have an office).

That's all for today! Thanks for reading!