As I close in on a full public release of Rktcr, one of my tasks is to settle on a set of worlds to ship with the game. I've already talked about how my code generates worlds, but this is only part of the story.
You see, the pool of worlds I generate is skewed -- only candidate worlds that are solvable are maintained, and this tends to bias toward worlds that use zones with many potential paths. But how bad is this skew? To investigate, I generated 1000 3-gem worlds and made a plot of the number of times each zone appears divided by the total number of zones:
Ideally, this curve should be flat -- each zone should appear about the same number of times. But this is not the case.
The zone on the far left (which actually only appears once out of the 1000 worlds) doesn't have its claims properly set, so -- as far as the world generator knows -- is basically impassible. This should change once I re-work the level somewhat, so it is probably safe to ignore.
However, the second-least-frequent zone is still about than 8 times less likely than the most frequent zone.
Flattening the Distribution
There is no reason for me to ship this set of 1000 worlds, however. Is there a subset with a better distribution of zones? It appears so:
One trimming method I came up with is to repeatedly pick a world whose zones all appear more frequently than average, and remove that world. This flattens the curve somewhat, but soon enough the method runs out of worlds to remove.
The max-min approach removes the world with the most frequent least frequent zone; since this can always find a world to remove, I terminate the search when there are 100 worlds remaining.
Finally, I tried an approach that repeatedly removes the world whose zones have the highest average frequency. This flattens the size-3 worlds the most of any of the methods, with the second-leftmost and rightmost zones only varying about 5% in frequency. (I'm disregarding the leftmost zone, as I mentioned above).
Using the average-frequency metric does seem to come at a slight price, however -- the average number of zones per world decreases to 5.2, from an average (over the starting 1000) of 5.5.
Three-gem worlds are the smallest that appear in Rktcr. How do these approaches work on the medium (seven-gem) and large (fourteen-gem) worlds? Are they even needed?
Looking at the frequency of each zone in 1000 worlds with seven gems (left), and fourteen gems (right), it appears that some leveling might help. Interestingly, the fourteen-gem worlds already seem more balanced. I conjecture that this is because they are each already using (on average) 25 zones of the 30 possible, which makes it hard to not pick a zone; additionally, the larger world size may provide more flexibility in entering a zone resulting in more potential for completion.
The top-half strategy fails to work for both seven-gem and fourteen-gem worlds as they tend to contain more than half the zones already, so always have one which is less frequent than average.
Max-min again provides some leveling, especially effective at the bottom end (in terms of frequency).
Finally, removing worlds with the highest average frequency works well at the top end, though the fourteen-gem worlds result (right) does have a pronounced jump at the most frequent position, and both suffer from a roll-off toward the left.
It appears that, for three-gem worlds, the average frequency heuristic works well; while for larger worlds, some combination of average frequency and maximum minimum frequency might be better for leveling the distribution at both the high and low ends.
However, frequency of appearance doesn't tell the whole story. In a follow-up post, I plan to investigate the co-occurrence of zones as well as (possibly) the connections between zones. It will be especially interesting to see if the trimming strategies I've outlined tend to skew these other statistics.