1. icon for News
  2. icon for TCHOW


Tuesday, July 2, 2013

Rktcr: Hunting a Box2D Glitch

I spent the afternoon hunting down the source of an unfortunate physics glitch in Rktcr. I think I've got it figured out, though I will say that the solution isn't entirely satisfactory. In this post I'll go through what I did to find the buggy code, what I think the problem was, and how I fixed it.

Isolating The Problem

This is what the physics glitch looked like in-game:

The problem seems to be collision between the central block (a dynamic body represented as a union of convex polygons) and the level geometry (represented as a chain of edges). Here's my first version of a trimmed-down test case:

The diamond falls, then glitches to the right suddenly. In this picture, and the following, you'll see lots of debug drawing being done. I think very visually, so having relevant data visualized helps.

At this point, I was thinking that the problem must have to do with my convex decomposition code splitting the colliding point of the polygon. However (thankfully) I made an even simpler test case that disproves that hypothesis:

Again, the blue wedge falls, touches the yellow platform, and glitches to the right. After one more test case revision (simplifying further, to a triangle), I was ready to move on.

It's hard to figure out just what the contact visualization is showing (and Box2D may create/destroy contacts during a solution step), so to get a better understanding of what was going on, I hacked a quick "disable all collisions" flag into the b2ContactListener Rktcr uses to filter contacts.

With collision resolution disabled, the reported contacts still looked decidedly odd. I was actually wondering if my visualization was working -- it really shouldn't be the case that the polygon is considered to be deeply penetrating the bottom edge of the yellow wedge. (The red "X" and edge indicates the colliding primitive.)

With this problem in mind, I dug into Box2D's source code. The relevant file turns out to be b2CollideEdge.cpp, which you might want to open in a new tab if you are code-minded.

Reading through the code, I caught a few things that appear to be bugs (or at least not very clean code), but also aren't responsible for the behavior I'm seeing. For sake of completeness (and to convey a bit of how grueling the slog was) I'll complain about them for a moment.

The first bit of sloppy code I noticed is that struct b2EPCollider contains an enum VertexType which is never used, other than as the type of two members -- m_type1 and m_type2 -- which are also never used. Probably remnants of a different strategy.

The next sloppy bit appears at lines 439-445; the function ComputeEdgeSeparation() is called to return an axis, and the type of the axis is checked. This is odd, because the type of the returned axis is always b2EPAxis::e_edgeA (see the function definition at line 620).

I also noticed that the constant "b2_maxManifoldPoints" is used when the number "2" would be more appropriate (e.g. line 561) -- as one always passes two points to the clipping functions, and they return up to two points. But I digress.

The Actual Problem

After muddling through the code for more time than I'd like to admit, I finally began to get a grasp on the problem. In essence, the logic of the m_front determination (lines 273-410) is flawed.

The picture above illustrates what the code is doing -- it's assigning the outside direction (and valid separating axis directions) of an edge based on the polygon's centroid. (I.e. centroids in the blue region will collide against the edge using the blue normal, and in the red region will use the red normal.)

I drew a picture like the above when reading the code, and I'm relatively sure it's what the person writing the code was thinking about. Unfortunately, it's flawed.

Essentially, when the adjacent edges in the chain turn sharply, the "front" region can encompass the back of the edge. This can become especially problematic when using an edge loop as the outside of a level, as in this test case:

This behaves very, very badly owing to the little jogs in the bottom left and right, which convince the collision algorithm that the polygon is colliding (deeply) with the bottom edge.

The Fix

I'd like to be able to say that I have an elegant fix for this problem, but I don't. Partly this is because I still don't have a handle on the problem the collision normal limiting scheme was attempting to solve. That said, just ignoring the adjacent edges -- that is, commenting out lines 273-410 and leaving the no-adjacent-verts code (lines 411-425) to sort things out -- seems to work just fine:

And, thankfully, this only invalidates a few of the level solutions, so I won't need to replay too many paths.