Ignorance Compounded

Tuesday, January 12, 2010

Dual dispatch

Computer programming is about describing the behavior of entities in various scenarios. For instance, if you're writing a data entry form, you might need to describe how it behaves when you click a certain button, how it behaves when you type into a field, and so forth. In object-oriented programming, we try to organize the code that describes an object's behavior together with the data that describes its state.

This gets messy when objects interact. Suppose you've got three Animals (Dog, Cat, Monkey), and three kinds of Food (DogChow, CatChow, MonkeyChow). You want your program to ensure that Monkeys can only eat Monkey Chow, Cats can only eat Cat Chow, but Dogs can eat all three kinds. So, you add a method to each of the three Animal classes, something like "boolean canEat(Food chow)". The implementation for Cat looks like "return chow instanceof CatChow", Dog looks like "return true", and so forth.

What happens when you add an animal? No problem, you just have to implement its canEat() method. What about when you add a new type of food? Well, you have to go through all the existing animals and make sure their implementation is still right. For instance, since chocolate is poisonous to dogs, the Dog implementation of canEat() is wrong. No compiler error, but your dog might die.

Or, you could flip the problem around, and put methods on all the Foods, saying what animals can eat it. Now, when you add a food, it's just a matter of implementing its canBeEatenBy(Animal animal) method. But when you add a new animal, you have to check all the Food implementations.

This problem of how to describe the interactions between entities is called "dual dispatch." I know of no particularly good general solution.

Thursday, September 3, 2009

Test Automation

Writing good tests is hard: test code tends to be opaque, fragile, and hard to maintain even when written by the same developers who write decent product code. Why?

Test code is inherently introspective: it is about the code it is testing. Product code is about a user-domain problem - for example, managing the relationships between a company and its customers. Test code is about the user-domain problem and about the product code. As a concrete example: writing tests often requires exposing (through package sharing or through "test-only" APIs) methods that don't make sense otherwise, in order to give the test code the ability to inspect the internal state of objects in the product code.

Introspection, or self-reference, increases conceptual complexity. An object that does something is simpler than an object that reasons about an object that does something.

Monday, July 13, 2009

Dogfood

The company where I spent my vacation has a very strong tradition of "eating one's own dogfood," which means using the products they are developing. This is supposed to improve the quality of the product, because one becomes painfully aware of the problems and is motivated to fix them.

My recollection is that when I first heard the phrase, back in the early 1990s, it was "tasting," not "eating," one's own dogfood. There is a significant difference between the two.

The problem with eating only your own dogfood is that you start getting used to the taste of dogfood, and you don't discover that the rest of the world has learned how to cook a decent meal.

Maybe it is better to eat whatever the best food around is, while occasionally being forced to taste one's own dogfood. For example, let development teams use whatever products they want, but have test days where the teams work through predefined customer use cases using their own products.

Tuesday, June 30, 2009

Vacation

I spent the last month on "vacation", working for a different company on a different product in a different programming language. It was actually quite a lot of fun; I'd like to take more such vacations (not least because, unlike the sort where one travels to a different country, I got paid to do it.)

I was working primarily in C#, a language I've not spent much time in till now. I was also working on Windows-specific user interface code, rather than the platform-independent systems-level code I normally write.

C# is a weird language. It feels to me like something of a kitchen-sink language: it's got a bit of practically every language idea I've ever heard of. Generics, function delegates, event handlers, closures, ... there are about five times as many keywords as Java has. Java is to C# as Spanish is to English.

There were a few things that I really liked: I did get pretty friendly with closures, for instance, and the language support for iterators is cool. Java still feels more elegant to me, though. There's just too many ways to skin the same cat in C#. And Java's collection and concurrency libraries are still way better.

The one thing I most disliked about C# isn't really a fault of the language as such: the naming conventions. In Java, by convention, we name types with capital letters; methods, variables, and packages with lower-case. This helps disambiguate some syntax that would otherwise be hard for humans to read (even though it doesn't bother the compiler at all). In C#, by convention, namespaces, types, and methods are all capitalized; fields and variables aren't. Properties, which are fields that have built-in getter and setter methods (so they can be accessed by assignment syntax), are capitalized. To my eye, it's hard to read and ugly.

To a compiler, none of these distinctions really make much difference; the differences between the .NET and Java virtual machines is pretty small. Programming languages, as I've said before, are a way for programmers to express ideas: they're for people, not computers. Esthetics matters.

If I were in St. Louis, I'd be attending this talk on the Fan programming language, which claims to be compilable to both virtual machines and to get rid of a lot of the uglifications of both Java and C#.

Wednesday, May 6, 2009

I Don't Quite Get Dependency Injection

Here's how you write a program that prints out "hello world" in ordinary Java:


public class Hello {
  public static void main(String[] args) {
    System.out.println("Hello world!");
  }
}

I'm reading Spring In Action, and this excellently written book starts out by writing "hello world" in Spring-enabled Java. It's about ten times as long; I won't reproduce it here. The main benefit is that the "Hello world" string is in an XML file, instead of in a Java file, so that it's easier to change.

Or something. Personally, I'd rather edit Java than XML. In my experience as a developer, it's actually much easier to write cryptic bugs in XML (or more broadly, in configuration code of any sort) than in Java. This is because the rules of Java (or other formal languages, such as C, C++, heck, even Perl) are more standardized, better defined, better testable, and better documented than those of proprietary configuration languages such as Spring. If I change a character string in Java, I can predict quite well what will happen, and I can watch it happen in the debugger. If I change a string in an XML configuration file, I have no way to know who is reading that file or what they are doing with it. It's magic.

In the 1970s, when I first learned to program, the field was predominantly procedural: a program was a bunch of instructions to be followed one after another. The instructions might involve reading data from files and acting on those data, but the data did not modify the program itself; in fact we believed at the time that it was very poor form to let a program be self-modifying, because it made it hard to understand how it would behave, so we were very reluctant to consider the data files to be part of the program.

Thirty years later, it feels to me like the software industry has moved very strongly toward configuration-based programming. We want our procedural (Java) code to do less and less, and instead we want to control behavior by way of increasingly complex configuration. A program I'm currently working on has got more configuration files than it does Java files, and they are spread over more directories on the hard drive. The procedural part is all in one language: Java. The configuration part is in EHCache, Hibernate, Spring, Maven, JDBC, and Log4J, each with its own cryptic syntax that is subject to change with every version and that is documented in bits and pieces on web forums and spottily-available books.

As so often, I find myself wondering if the emperor has any clothes. Is it really easier to program this way? Or is it just different?

I wonder whether instead, we should focus on figuring out what is "hard" about procedural programming, and work on making that easier, within the confines of a formally defined, easy to understand and read language. For instance, IDEs could easily help with changing dependencies - in fact, an IDE could look at all the pieces of a program, determine all the external dependencies, and present them as a single view, allowing the programmer to substitute equivalent components.

Monday, April 6, 2009

Improving Maven

I'm still at loggerheads with Maven. There are some specific problems that bother me and that I think could be improved while preserving its basic Maven-ness. Many of the problems center around issues when simultaneously developing more than one component at a time, i.e., when depending on SNAPSHOT versions. In fact a SNAPSHOT dependency is a fairly good indicator of misery; but some small improvements in Maven could reduce that pain point by a lot.

1. Maven knows that source produces artifacts, but it doesn't know that artifacts come from source. When Maven checks dependencies, it looks in local and remote repositories, but it doesn't know enough to build (or re-build) an artifact from source.

If Maven artifacts (at least locally deployed ones) had a backpointer to the location of the source project that produced them, then it would be possible to check dependencies against source. For instance, if project B depends on A, and I touch project A's code and then rebuild B, project A should also get rebuilt and installed. Similarly, if artifact A came from local source, then it almost certainly should NOT get replaced by an "updated" artifact from a remote repository, even if the remote artifact is newer; rather, local source code should always get honored. Extra points for a "mvn svn:update" command or the like, that would transitively sync the version control system to the latest code in all upstream projects.

2. SNAPSHOTs need to be versioned. When you're collaboratively working on two projects, and an API between them changes, the downstream build is broken until the upstream project gets refreshed. But right now that happens in a nondeterministic, asynchronous way: to Maven, all SNAPSHOTs are identical until, around midnight or so, it decides to refresh. Basing refreshes on an update interval is like filling up your car's gas tank every Friday: it's either too soon or too late. This needs to be deterministic. What I really want from SNAPSHOT is the idea of a fine-grained version number, that I will throw away upon release. It could be as simple as letting me say 1.3.0-SNAPSHOT-002, instead of just 1.3.0-SNAPSHOT.

3. Maven assumes that the internet is fast and reliable. It is neither, as anyone who works from coffeeshops and airports knows all too well. When Maven fails to get a network connection, or the network dies midway or times out, it needs to be able to roll back to a known and working state. Among other things, this means that updates need to be atomic across projects, or at least they need to be nondestructive. It also means that basic help should not rely on a network connection. Maven should not attempt to update plug-ins or projects during a 'clean' or 'help' operation.

I've got other problems with Maven - for instance, I think XML is nasty to work with, and I think that "convention over configuration" translates to "doesn't play well with others." Those things are harder to address while still keeping it Maven. But if the above three improvements were made, I think no one who loved Maven would be harmed, and a lot of other folks would be helped.

Thursday, April 2, 2009

Eclipse awards

I spend most of my professional life feeling ignorant of one thing or another, so in the few areas where I do know at least a little bit I try to help out. For that reason I post pretty often to the eclipse.newcomer newsgroup. I'm proud to have been a finalist for the Eclipse Newcomer Evangelist award for the second year in a row: