Sunday, June 6, 2010

Automated tests for content

Games are generally considered to consist of code and content.
  • The code is the game engine that defines all the rules of the game, input handling, network, graphics rendering, and so on.
  • Content in a game includes a lot of things: textures, sounds, level designs, ai, animations, meshes, dialogues and more.
Another (less formal) way of looking at it is that code is the part of the game produced by the programmers, and content is the part of the game produced by artists, game designers or level designers.

Programmers are used to writing tests for their code (if they're using TDD, they're even writing some, if not all, tests before they write the code), but content creators generally are not. They instead rely on manual testing, both by themselves and by gameplay testers.

Why is code tested and not content?
I think the code is what makes the game robust, coherent and logical. It's easy to detect broken code - broken code either crashes or behaves incorrectly. This makes it feasable to write tests for the code.

Content on the other hand is harder to verify automatically. How can a machine decide that a texture is broken? We could certainly verify that the texture has the correct dimensions and correct color depth, and other properties needed for the game engine to support it. We should have tests to verify the integrity of the content to as large extent as possible, and I think we do.

The things we can't verify automatically is the fun and the user experience.

The corner case
This was just the setup. Now for one of the corner cases, the parts that can be considered both code and content: AI!

AI is definitely a part of the content. Meeting unique non-player characters (NPC) with custom AI adds a new user experience that is something unknown, and therefore possibly exciting. The AI however is also expressed in some sort of code, either as a script in some embedded language, as a behaviour tree or behaviour stack.

The problem with AI as code, is that it tends to mutate at a much faster pace than game engine code. AI is driven by the content around it, not technical demands, and so it must change often.

Writing tests for each individual AI is expensive, since fully testing all the possible code paths can require a lot of manual setup, and verifying correctness is also tricky for a suffiently complex AI. What's worse, the test is bound to be very fragile, since content makers like to tweak and redo AIs if it's too easy, too difficult or simply not fun enough.

Should content like this be covered by automated tests? To some extent, definitely! I am not convinced that content should be tested individually, but testing the content by type is quite useful to detect errors.

For AIs this could mean:
  • Running automated smoke tests that run the AI characters for a long time (game time, not real time) to increase the chance of running many code paths (possibly with the AI opponent doing random moves) and simply checking that it doesn't crash. This type of test could apply to most if not all AI. It would be cheap to implement, and probably be fairly stable.
  • Smoke testing the AI with measurements of key values, such as damage output, movement speed, actions per minute, et.c. and check that the values are in a sane range. If some of the values are zero, this would indicate that the AI is blocked somewhere and one of the code paths is never run, and extremely high values would indicate that the AI is stuck in one of the code path loops.
You should always consider the cost versus the gain of writing tests. For game engine code, the gain is almost always greater than the cost, but for content it's not as clear.
You should also consider how to test something. A small unit test is suitable for many parts of code development, but it's probably overkill for content (which tends to change a lot).
If you can write one or a few tests that can verify something about a specific content type, and apply the test to all the content, the gain will quickly become large, since you can scale up the size of the content without scaling up the number of tests.

No comments:

Post a Comment