Friday, October 22, 2010

time-delayed feedback in the workplace

The job of buildmaster rotates amongst managers. The buildmaster is primarily responsible for haranguing developers when the automated test failure rates are too high; and if they are too high for a while, the buildmaster can "lock the line", meaning that the only permitted checkins are those that ostensibly fix tests. We have some test suites that take several days to complete. Thus a bad checkin may cause test results to plunge days after the fact.

In Peter Senge's classic The Fifth Discipline, he talks about the effect of introducing a time delay into a negative feedback system. Whereas negative feedback usually stabilizes a system, negative feedback plus time delay tends to cause ever-more-violent oscillation.

Consider the following actual data:










Test Current EOD 10/21 EOD 10/20 EOD 10/19 TARGET
fast_suite 97.55% 98.77% 99.39% 99.39% 98%
slow_suite 86.43% 94.10% 83.61% 95.29% 97.5%


The fast suite returns feedback in a couple hours; the slow suite takes a few days to catch up to a changelist.

I am assured by various people that it sucks to be the buildmaster. It will continue to suck to be the buildmaster, I think, until we devise a system that is stable rather than oscillatory. A stable system is characterized by damping rather than nonlinear gain; and by feedback that is at least an order of magnitude faster than the forward phase response of the system. (It's possible to stabilize systems other ways, but this is the most general and reliable.)

To speed up the feedback loop, we could have fast suites that predict the behavior of the slow suites. Simply choosing a random subset of the tests in the slow suite, running those first, and providing interim results could achieve that.

To have damping rather than nonlinear gain, we need to remove or highly restrict the buildmaster's ability to lock the line; and instead, we need to increase the amount of pre-testing that is required in order to do a checkin. For instance, if interim results indicate a high failure rate, then new checkins should be subjected to a higher level of testing in the precheckin queue before they are allowed to actually commit.

Friday, August 13, 2010

The Compiler Is Not The Audience

Is the code that you write making life easier or harder for the next person who has to work in the same area? Are you creating complexity and fragility that will slow them down, or platforms, patterns, and utilities that will speed them up?

Thursday, August 12, 2010

test modularity

Principle: keep your tests of functionality separate from your tests of business rules.

Example: I have some code that provisions licenses (bundles of permissions) to entities. I should have one suite of tests that verifies that it is possible to provision arbitrary combinations of permissions correctly - that is, the functionality. The clients of my code should have tests that verify that they are choosing to provision the particular combinations they expect - that is, the business rules. My tests should not fail when the business decision of what permissions to grant to which entities gets changed.

Tuesday, August 10, 2010

Code ownership

As an aside: I've not written much lately because I'm working for a non-open-source company, which limits what I can talk about without crossing IP boundaries. I've decided to start posting again but will have to be a bit vague.

My company's roots are in web application development. This seems to contribute to a more horizontal rather than structured architectural style: all groups for themselves, each working on small user-facing features. There is a general principle that everyone is allowed to check code into anyone's area: code "ownership" is discouraged.

I find this a problem. At this point our codebase is quite large. No one, not even the principals of the company, understand it all; I routinely ask them questions and get back answers like "well, it's been a while since I worked on that so I'm not sure." But at the same time, no one can fully understand even the piece they work on, because it has been partied on by a myriad of developers, few of whom were well versed in its design, its test suite, its intentions, its history.

Tuesday, January 12, 2010

Dual dispatch

Computer programming is about describing the behavior of entities in various scenarios. For instance, if you're writing a data entry form, you might need to describe how it behaves when you click a certain button, how it behaves when you type into a field, and so forth. In object-oriented programming, we try to organize the code that describes an object's behavior together with the data that describes its state.

This gets messy when objects interact. Suppose you've got three Animals (Dog, Cat, Monkey), and three kinds of Food (DogChow, CatChow, MonkeyChow). You want your program to ensure that Monkeys can only eat Monkey Chow, Cats can only eat Cat Chow, but Dogs can eat all three kinds. So, you add a method to each of the three Animal classes, something like "boolean canEat(Food chow)". The implementation for Cat looks like "return chow instanceof CatChow", Dog looks like "return true", and so forth.

What happens when you add an animal? No problem, you just have to implement its canEat() method. What about when you add a new type of food? Well, you have to go through all the existing animals and make sure their implementation is still right. For instance, since chocolate is poisonous to dogs, the Dog implementation of canEat() is wrong. No compiler error, but your dog might die.

Or, you could flip the problem around, and put methods on all the Foods, saying what animals can eat it. Now, when you add a food, it's just a matter of implementing its canBeEatenBy(Animal animal) method. But when you add a new animal, you have to check all the Food implementations.

This problem of how to describe the interactions between entities is called "dual dispatch." I know of no particularly good general solution.