bitgineering

Engineering, Video Games, and Ranting

  • About

Bug of the Day 4/18/2013

Posted by Justin on April 18, 2013
Posted in: Code, Rants. Leave a Comment

Generally, I dislike singletons. There are a number of reasons, especially in a multi-threaded world, but that’s another post. The naive implementation of a singleton is an infuriating and dangerous thing when misused. Let me explain.

When we run out of memory on a console, we do what most people do and dump the state of the allocators to the log and crash. This is a bit more complicated, because we use a threaded background logging/debug output mechanism so that logging doesn’t affect performance too much on the consoles. So, we need to shut that down before doing evil things (like telling one of the allocators that it’s ok, all of the memory is free, just give us some so we can print out this report) and before it tries to allocate a new queue entry. In essence, our out of memory handling code  looks like:

ShutdownBackgroundLogging();
DumpAllocatorState();
Crash();

And then ShutdownBackgroundLogging() looks like:

BackgroundLogger::GetInstance()->Shutdown();
BackgroundNotifier::GetInstance()->Shutdown();

The experienced reader can see where this is going. GetInstance() for both of these looks awfully familiar:

static BackgroundThing* pThing = NULL;
if (!pThing)
    pThing = new BackgroundThing();
return pThing;

This is fine, or so it seems. That is, until you realize that no one has talked to the BackgroundNotifier since we’ve been running. So while we’re shutting down in an emergency, it tries to allocate a new one so that we can return it from GetInstance(), and then promptly shut it down. This is logically silly, but probably wouldn’t be disastrous if we weren’t OUT OF MEMORY at the moment. new() is not our friend at this point.

Every engine I’ve ever worked in has had one of these buggers buried somewhere. Everyone wants some sort of formal pattern around their global facilities, which is admirable. If you must use the singleton pattern, always make sure you initialize the singleton in a controlled manner, preferably in your app startup somewhere. Better yet, don’t use a formal singleton. Have a TLS context or something.

Optimization of the Week, 3/28/2013

Posted by Justin on March 28, 2013
Posted in: Code, Rants. Tagged: software, system beep. Leave a Comment

We have a tool that encrypts and uploads game configuration data for MP/balance purposes to our web servers. We have it integrated into our build system, so someone with permissions can just push a button and the files will be uploaded. Annoyingly, when run from the build machines, this would take upwards of 10 minutes to upload all of the files. If I ran it locally, it was near instantaneous to upload a single file. That’s weird, but since it’s a rarely done operation and it did work properly, I couldn’t spend too much time figuring out what the deal was.

The time came recently to update the tool to compress the data in addition to encrypting it (should have been doing that all along, sigh). So we needed to update the tool to do this, since we don’t want our web servers burning clock to do the compression or encryption when the data is static. During the process of testing the changes, my colleague ran into the same upload speed problem, having copied the command line for testing from the build process.

It turns out that the difference between what I (and the guy who wrote the tool originally) was running locally when I uploaded data by hand was the inclusion of the –verbose flag on the build machine. That’s not unusual; when a process is running non-interactively, I want as much information logged as possible when something inevitably goes wrong. However, this particular tool, when in verbose mode, will echo EVERYTHING to the console, including the encrypted stream contents. In the case of our data, the encrypted stream contains a rather large number of the ASCII BEL character (0×7). It turns out that when you echo a BEL to the windows console, it waits for the system beep/bell noise to play completely before resuming the output stream. This means that each BEL in the stream is a 0.5-1s wait. Add up a few hundred/thousand of those, and you have a problem.

It seems that you can, in fact, be a little TOO verbose in your logging.

Reflecting Game Data Into C# With Dynamic Type Generation, Part 2

Posted by Justin on April 10, 2012
Posted in: Code. Leave a Comment

In the last post, we were emitting a new dynamic type into an assembly. Now it’s time to add some data members to this new type. We’re going to start with Microsoft’s implementation, and then improve it.

First, let’s decide what we want to be able to store. In my implementation, I chose to deal with basic types (int, float, string, bool), object references, enums, and dynamic arrays (represented as List<T>s). I also wanted to support fixed arrays of value types. In order to do this, I made a slight concession for convenience: scalar values are fixed arrays of size 1. This helps to reduce the amount of divergent code needed. So what we’ll need is a class to represent our reflected properties, and we’ll need a method to emit their definitions into the ReflectedClass’s TypeBuilder. Let’s break down how ReflectedProperty.Emit(TypeBuilder Builder) needs to work:

Since we’re going to be adding these data members as properties on our new type (so that PropertyGrid and friends can find them), all we need is a storage data member, a getter, and a setter. Easy enough! First, let’s talk storage:

// CLRType is a property of our ReflectedProperty class which gives us the property type
// If CLRType is Int32, then CLRType.MakeArrayType() is equivalent to typeof(Int32[])
Type StorageType = (Type != ReflectedType.ARRAY) ? CLRType.MakeArrayType() : CLRType;
// We expose the property as a scalar value if it doesn't actually have more than one array element
Type PropertyType = (ArrayDim > 1) ? CLRType.MakeArrayType() : CLRType;

Because all data members are stored in arrays for ease of getting/setting, we just have to convert our basic type into an array for storage, then expose the property that represents it as its real scalar type.

// Add the private field, named _<Name>
FieldBuilder Field = Builder.DefineField(FieldName, StorageType, FieldAttributes.Private);
// Now add a public property to wrap the field, <Name>
PropertyBuilder Prop = Builder.DefineProperty(Name, System.Reflection.PropertyAttributes.HasDefault, PropertyType, null);

And just like that, we have storage. Now we need to be able to access it. To do this, we need to attach get and set methods to the property.

MethodInfo GetValueMethod = null;
MethodInfo SetValueMethod = null;
Type GetReturnType = CLRType;
Type SetArgumentType = CLRType;
BindingFlags MethodSearchFlags = BindingFlags.Instance | BindingFlags.FlattenHierarchy | BindingFlags.Public;
if (Type == ReflectedType.ARRAY)
{
   GetReturnType = SetArgumentType = CLRType.GetGenericArguments()[0];
   GetValueMethod = StorageType.GetMethod("get_Item", MethodSearchFlags, null, new Type[] { typeof(Int32) }, null);
   SetValueMethod = StorageType.GetMethod("set_Item", MethodSearchFlags, null, new Type[] { typeof(Int32), SetArgumentType }, null);
}
else
{
   GetValueMethod = StorageType.GetMethod("Get", MethodSearchFlags, null, new Type[] { typeof(Int32) }, null);
   SetValueMethod = StorageType.GetMethod("Set", MethodSearchFlags, null, new Type[] { typeof(Int32), CLRType }, null);
}

Note that for dynamic arrays, we have to use the special get/set_Item functions (which are basically the operator overloading mechanism for the [] operator).

Now for the fun part. As Larry Wall once said, “Real Programmers can write assembly code in any language”. .NET makes it relatively easy, by exposing the CLR intermediate language instruction set to us. We need to write a thunk function to call our get/set methods on our newly created property. First, let’s deal with array access:

// These attributes will be applied to our get/set functions. We need a public, special, hidden function.
MethodAttributes GetSetAttr = MethodAttributes.Public | MethodAttributes.SpecialName | MethodAttributes.HideBySig;
// Build get method
Debug.Assert(GetValueMethod != null);
MethodBuilder GetMethod = Builder.DefineMethod("get_" + Name, GetSetAttr, GetReturnType, new Type[] { typeof(int) });
ILGenerator GetOpCodes = GetMethod.GetILGenerator();
GetOpCodes.Emit(OpCodes.Ldarg_0);              // push this
GetOpCodes.Emit(OpCodes.Ldfld, Field);         // push field
GetOpCodes.Emit(OpCodes.Ldarg_1);              // push index
GetOpCodes.Emit(OpCodes.Call, GetValueMethod); // call this.GetValue(index)
GetOpCodes.Emit(OpCodes.Ret);                  // return
Prop.SetGetMethod(GetMethod);

// Build set method
Debug.Assert(SetValueMethod != null);
MethodBuilder SetMethod = Builder.DefineMethod("set_" + Name, GetSetAttr, null, new Type[] { typeof(int), SetArgumentType });
ILGenerator SetOpCodes = SetMethod.GetILGenerator();
SetOpCodes.Emit(OpCodes.Ldarg_0);              // push this
SetOpCodes.Emit(OpCodes.Ldfld, Field);         // push field
SetOpCodes.Emit(OpCodes.Ldarg_1);              // push index
SetOpCodes.Emit(OpCodes.Ldarg_2);              // push value
SetOpCodes.Emit(OpCodes.Call, SetValueMethod); // call this.SetValue(index, value)
SetOpCodes.Emit(OpCodes.Ret);                  // return
Prop.SetSetMethod(SetMethod);

Note that we ended up writing the get_MyDataMember and set_MyDataMember functions, which is how C# internally decorates the get/set methods for a property. If you use managed C++, the functions are just named that way in the first place. Also check out how the ILGenerator lets you cheat like crazy with some of the instructions, in this case allowing us to just send in the FieldInfo representing our private data member as an argument (instead of a memory offset).

The scalar value get/set opcodes are identical, except that instead of passing in an index, we use the Ldc_I4 instruction to pass in 0 as the hardcoded index:

// Build get method
Debug.Assert(GetValueMethod != null);
MethodBuilder GetMethod = Builder.DefineMethod("get_" + Name, GetSetAttr, GetReturnType, new Type[] { typeof(int) });
ILGenerator GetOpCodes = GetMethod.GetILGenerator();
GetOpCodes.Emit(OpCodes.Ldarg_0);              // push this
GetOpCodes.Emit(OpCodes.Ldfld, Field);         // push field
GetOpCodes.Emit(OpCodes.Ldarg_1);              // push index
GetOpCodes.Emit(OpCodes.Call, GetValueMethod); // call this.GetValue(index)
GetOpCodes.Emit(OpCodes.Ret);                  // return
Prop.SetGetMethod(GetMethod);
// Build set method
Debug.Assert(SetValueMethod != null);
MethodBuilder SetMethod = Builder.DefineMethod("set_" + Name, GetSetAttr, null, new Type[] { typeof(int), SetArgumentType });
ILGenerator SetOpCodes = SetMethod.GetILGenerator();
etOpCodes.Emit(OpCodes.Ldarg_0);              // push this
SetOpCodes.Emit(OpCodes.Ldfld, Field);         // push field
SetOpCodes.Emit(OpCodes.Ldarg_1);              // push index
etOpCodes.Emit(OpCodes.Ldarg_2);              // push value
SetOpCodes.Emit(OpCodes.Call, SetValueMethod); // call this.SetValue(index, value)
SetOpCodes.Emit(OpCodes.Ret);                  // return
Prop.SetSetMethod(SetMethod);

At this point we just need some convenience functionality for getting and setting values, and some derived properties to make special-casing the different types and conversions cleaner. I’ve hacked up a sample that provides a more complete implementation here, under the MIT License. Feel free to use this to kickstart your own data marshaling, and make tools development easier!

Reflecting Game Data Into C# With Dynamic Type Generation, Part 1

Posted by Justin on April 7, 2012
Posted in: Code. 1 comment

Most developers I talk to much prefer to write their tools in C# these days. WinForms and WPF are rich UI toolkits, and most programmers can whip up a tool with them pretty quickly. The tricky part is marshalling your game data into and out of your C# tools. Some people use Managed C++ wrapper code, some write everything out to XML, and others use sockets. I’ve developed another approach using .NET’s excellent reflection capabilities.

The basic idea here is that we would like to be able to have a data structures that look like our game objects in C#, but our engine is written in C or C++, or any other language for that matter. We want type safety, and we want to be able to (ab)use all of .NET’s nifty automagic data editing tools, like PropertyGrid and DataGrid.

Most engines have some sort of reflection in place for at least game data objects, if not the entire engine. This approach assumes you have some way of reflecting your objects in your engine, and can ship them over a socket or through an intermediate format. Interestingly, this method can actually be used as your intermediate format, as the generated assembly can be saved out as a DLL and then loaded by other applications.

The core of this method is .NET’s ability to perform dynamic type generation. A colleague at Firaxis, Eric Jordan, first pointed out this concept to me when he was writing the Civ V world builder (the user data system is built on this principle). I then added support for arrays, nested types, and categories (initially as an exercise, later ended up in production).

In order to dynamically create types, they need somewhere to live. Thus we need to create an assembly:

ModuleBuilder ReflectedModule = null;
AssemblyName AsmName = new AssemblyName("ReflectedTypes");
AssemblyBuilder AsmBuilder = AppDomain.CurrentDomain.DefineDynamicAssembly(AsmName, AssemblyBuilderAccess.RunAndSave);
ReflectedModule = Asm.DefineDynamicModule(AsmName.Name, AsmName.Name + ".dll");

Next, we create a type within that assembly:

TypeBuilder Builder = ReflectedModule.DefineType("DynamicTypeTest", TypeAttributes.Public);
Builder.DefineDefaultConstructor(MethodAttributes.Public);
Builder.CreateType();
AsmBuilder.Save("ReflectedTypes.dll");
ConstructorInfo[] Ctor = Builder.GetConstructor(System.Type.EmptyTypes);
object MyDynamicInstance = Ctor.Invoke(null);

Now we have a new type that lives in the ReflectedTypes.dll assembly, and a constructor we can invoke to instantiate it. To verify, let’s load up ReflectedTypes.dll in the Object Browser:

In the next post, we’ll add some properties to this new class.

Saving Your Loading

Posted by Justin on March 8, 2012
Posted in: Code, Rants. 3 comments

Load times are one of those things that every developer figures out eventually for ship, but most people don’t think about them enough, and especially not during the development process. Programmers get all bent out of shape about compile times (and go to great, sometimes insane, lengths to manage them), artists have all sorts of software to deal with lighting bakes and distributed rendering. There’s no tech out there to just make your game load faster.

Long load times are detrimental to any kind of software development. The time it takes to iterate on changes can turn out to be extremely costly, and over time, can reduce what you are able to get done in a product. At Firaxis, we are an EXTREMELY iterative studio; we always have to try out everything, and our designers usually write a lot of their own prototypes or systems. If they can’t try out their ideas quickly enough, that would effectively reduce the number of things they can try, and that could affect what gets into the game.

Hypothetical: Consider a project with a team of 50 people, who each have to fire up the game 30 times a day (which I’d expect to be pretty average). If it takes a minute to get into a level from the time you push start (assuming of course that you’ve already given yourself a quick way into the game), that’s 1500 minutes spent loading, or 25 man-hours. That’s more than half a work week for someone just spent loading…in a single day! If you could cut the time in half (or hopefully more), you’d get back almost 2 days worth of man-hours! Your producer/project manager is salivating right now.

So what can we do? After all, it takes time to profile these things and fix them, and some of the fixes can be time consuming. It’s also hard to keep up with a team of people throwing content and new systems into the codebase constantly. I will tell you right now, it is always worth it to keep your load times reasonable, especially if you stay on top of them from the beginning of the project. You, the experienced engineer and team member, must remind your team of the importance of keeping iteration times low.

So what causes long load times during development? In my experience, it’s always some combination of the following things (in no particular order). Some of these might seem completely obvious to you, but yet I still see/hear about studios doing this stuff, so SOMEONE must not have gotten the memo yet! Ben Carter posted a similar list on AltDevBlogADay that mostly deals with loading in shipping products (I’m still amazed that games fall victim to vsync destroying load times). I’m going to focus primarily on in-development issues, where layout/final media can’t factor in, and you need to deal with an ever changing dataset.

  • Disk I/O: This one is obvious, but the various practices that cause disk I/O bottlenecks in development builds are often ignored or accepted as normal.
    • Loose file loading: This is a convenient feature for content creators, as they can get an asset into the game quickly and see how it looks/works. However, this is a leading cause of disk seeking, which murder your I/O throughput. But what if you just made a minimal data packing tool that content creators could use to pack their assets on export (or import, if your engine maintains an asset library)? A single seek and read per asset is much better than 20-30 (and sometimes many more). Most engines (especially console engines) pack all of the data for a level or gameplay session into a single file (at least in shipping builds) to avoid just this problem. There’s nothing to say that you couldn’t work from baked data most of the time, but then check the timestamp on any loose files and substitute them in during development. Hell, even repack the data in a background thread and save it out either to a cache or (if it makes sense) over top of the original cooked data file, so next time the load won’t have to hit the newly packed single file again.
    • Uncompressed data: Reading 100K is faster than reading 1MB. This is hardly a revelation. However, plenty of engines and tools work off of uncompressed data. Compressing data, especially minimally, doesn’t take that long, and only needs to be performed on import or saving of assets. An extra second taken on one content creator’s machine when an asset is changed is completely forgivable if it saves minutes or hours over time for the rest of the team. This should be part of your loose file packing mechanism from above. Compression shouldn’t only exist in final builds.
    • Synchronous loading: If your engine is still doing synchronous loads, this has to be your first priority. Get your disk I/O into another thread and put your requests in a priority queue, post-haste. Additionally, your I/O thread should not, for any reason, be syncing up with the game thread. Send fulfilled requests back via a queue that the game thread can dig through at its leisure, preferably at a time where it’s safe to be modifying global state.
    • Post-load processing: Separating out I/O into another thread will do wonders for you, but you might not feel the difference as much if you still have to do lots of main thread work once your data is in memory. This can be solved a few different ways:
      • Process your data ahead of time, as mentioned above. 9 times out of 10, the processing you need to do doesn’t require game context, and doesn’t need to be done at load time. Quite often, someone just wrote the feature that way to test it, and no one ever went back and moved it into the tools.
      • If your data must be processed in context with the game (as is often true in procedurally generated content, like Civ), consider doing the processing asynchronously from the loading AND the main thread. If you’ve threaded out your I/O, but then do 100ms of processing afterwards in your I/O thread, you’re going to have an I/O request backup.
  • Memory Allocation: Right now you’re thinking “even the slowest malloc() isn’t as bad as a disk hit”. This is true. However, anything you do enough times (I’ve recently seen a game recently require 3 million allocations to get to the main menu) will still hurt you. Just about every engine has a highly optimized malloc/free/new/delete facility, especially by the time the product ships. This is (hopefully) not the problem. If you find that you are spending a ton of time in new or malloc() during loading, fix that first! However, the problem is more likely to be your usage patterns, which are trickier to fix, especially when supporting features like loose file loading.
    • If you do a multi-pass object construction scheme (create top-level object fully, then instantiate and link up sub-objects recursively), at least allocate the memory for your sub-objects up front, preferably all in one shot with the original object’s allocation, then parcel out the memory right after your top level object to sub-objects. This has the additional benefit of attempting to keep a game object and all the exclusive stuff that it needs (Components, instance data) in the same contiguous block of memory, which will hopefully reduce cache misses (or page faults, if you are (ab)using virtual memory) later. If your memory tool doesn’t have a live bitmap/heatmap view, seriously consider building one. It can be a truly enlightening experience.
      • As a corollary to this: Do not allocate loading buffers from the same memory space as your persistent object storage, or if you must, don’t interleave them! Too many engines still do this. Allocations should be organized into separate pools (mapped to separate virtual memory regions, if possible) to allow for more useful memory usage reporting and reduced fragmentation. See Christian Gyrling’s writeup on Divide and Specialize. This has the added benefit of allowing your cache and prefetcher to help you out when you loop over groups of objects. CPUs are fast at running over small regions of memory over and over again, but less impressive when they have to jump all over memory to find all the bits you require to do your object’s bidding. I/O (or really, any temporary) buffers should never be interleaved with persistent storage, or you may not be able to coalesce those sections of memory when they free up. If you don’t have a pool-based allocator already, in a pinch you can try allocating a re-usable I/O buffer(or a ring of them) at startup, and then never allocate for I/O again.
    • If your engine is built around Data-Oriented Design (hopefully the important bits are), in addition to worrying about organizing your object bits into vectors of data, you must also think about WHEN you put each object’s data into your data stores. I’ve seen a few implementations that will have excellent runtime performance and data parallelization, but that upon creating a new object, add a single element (or worse, perform an Insert) to each of the engine’s data stores. Whoops. Unless you are using a clever Vector<T> (or other data structure, as the case may be) which allocates chunks in anticipation of many calls to Add(), you’re calling realloc() a bazillion times during loading, and probably creating a fragmented mess, which in turn slows down subsequent memory churn.
    • Are you allocating data from multiple threads through a synchronous allocator? If so, you are wasting tons of time waiting to sync. Check out nedmalloc or other multi-thread friendly malloc implementations. I helped a project a few years drop from >1.5 minute load times down to 45 seconds just by replacing new/malloc(). It wasn’t as seamless as it sounds, as they had some assembly that was twiddling with sbrk(), but that’s another article.
  • Loading Data Too Late: Does your game have a bunch of common data that it needs to load a level, but maybe doesn’t need to get into the main menu? Why not start loading assets in the background as soon as the engine is done bootstrapping itself?
    • Example: The in-game UI and (lets say you’re making a shooter or character action game) main character/weapons aren’t necessary up front, but you will need them by the time the user selects a level or startup configuration, then presses go. Go ahead and get them loaded up while waiting for the user to figure out what they want to do.
      • This becomes a balancing act; make sure you aren’t loading too much just to bootstrap the engine. Don’t, for example, load the bullet impact effects data before the main menu (this HAS happened).
  • Loading Data Too Early: Do you really need all your data the instant you load into a level in a dev build? Normally, someone is just testing out functionality, so why not have the game get underway as fast as possible? It probably doesn’t matter how it looks right away.
    • Textures: Only store the first mip or so of your textures in your level. Stream the rest in later.
    • Units/Animation data: Do you need all the baddies or friendly units right away? Stream them in later, or on demand (only in dev builds). Hook up animations later. T-posed dudes never hurt anyone!
    • There are plenty of other more game-specific things you don’t need right away. The nice thing about adopting a late loading/streaming paradigm for this stuff is that you are forced to build your engine facilities to be tolerant of missing or incomplete data, which will save you from the inevitable invalid data/loading race condition later. It also forces you to get your streaming and background I/O systems up to par and keep them there.
    • If some of your team (usually artists) really does need to evaluate assets quickly, give them a viewer. Short of that, give them a level with as little in it as possible and disable streaming so they can see things at full res.
  • Loading…at all: What if you just simply didn’t have to restart and load? This requires a lot of changes that are likely to create divergent code paths in your engine from what you intend to ship, but it also makes iteration amazingly effortless.
    • Assets: These are the easiest things to hot-load. Give your content creators the ability to hot-load single assets. This is much more feasible the more decoupled your engine is, but it’s certainly doable in most cases.
    • Script: If sections of your gameplay can be written in script, certainly do it, at least while the feature is in development. If you need to change how an a spell/ability works, why should you have to restart the engine? Just reload the script and cast again! Most script VMs I’ve used (Lua and Python come to mind) support reloading quite well.
    • Code: It is possible to structure your gameplay code as a re-loadable DLL. This allows you to just compile a chunk of your codebase and hit a key in-game, then test the change. This technique forces you to build your engine -> gameplay interfaces in a very specific way, and requires you to write a lot of framework code to manage the re-loading. I’ve seen this done successfully, but it requires so much diligence from the engineering team that I don’t know how feasible it is for everyone.

I know all this sounds like a lot of work, and any time you have to create systemic code that won’t ship, it can be an unattractive proposition. I hope, however, that the value of more iteration time outweighs the investment of making loading fast, and that more iteration leads to cooler stuff in your game.

Hello World!

Posted by Justin on February 11, 2012
Posted in: Uncategorized. Tagged: software engineering. Leave a Comment

I suppose it only makes sense to start a blog that I intend to be about software engineering with a Hello World post.

Posts navigation

  • Archives

    • April 2013
    • March 2013
    • April 2012
    • March 2012
    • February 2012
  • RSS

    RSS Feed RSS - Posts

Blog at WordPress.com. Theme: Parament by Automattic.
Follow

Get every new post delivered to your Inbox.

Join 569 other followers

Powered by WordPress.com