Friday, January 30, 2009

Maven Maven Maven

My post Maven Continues to Suck drew a number of comments, including a helpful and thoughtful comment from Jason van Zyl, who concluded "I don't think it's so much that Maven continues to suck as much as we need to do more than the free book we already have to get people past these simple setup problems that cause frustration."

I responded briefly in the comments but wanted to expand a bit.

Perhaps I will eventually see the light - I do keep hoping, because there doesn't seem much alternative - but I have not yet. Respectfully, Maven folks, I think it's optimistic to hope that if you "get people past these simple setup problems that cause frustration" you'll be in the clear. I've been working with Maven now for half a year; it continues to be perhaps my primary source of frustration, and to interfere in almost everything I do. I cannot say the same for any of the other tools I use. Neither my IDE (Eclipse), my version control system (SVN), my language (Java) nor its libraries were this problematic this far along.

The difference may in part be in my expectations; but the issues go beyond simple setup problems. To pick two other examples: (1) The Eclipse/Maven integration is rough, causing what should be simple Eclipse operations (saving files, jumping to referenced types, debugging) to be slower and less accurate. (2) No one on our team has yet figured a way to pipe arbitrary system properties from a Maven command line into a forked app server process in the context of a system test. These may be blamed on third-party software (M2Eclipse, Surefire); or on lack of Maven configuration skillz; but the Maven ecosystem is part of working with Maven, as is the fact that most developers on a team should not be expected to be Maven experts.

It is worth mentioning that invoking "mvn help" at the command line first tries to download stuff from teh internets, and then spits out a cryptic build failure error; mvn --help spits out a command line usage message that says nothing about how to actually get help. Even "mvn clean" tries to first do an update (which is always the wrong thing to do before doing a clean, because you may lose information about what to clean). Have you ever tried to use Maven without an internet connection handy, like on an airplane? Epic fail. I know about mvn -o but have never been able to get it to work, perhaps because the web of dependencies is so fragile and unstable.

By contrast, in our core code base we have a homebrew build script that solves all these problems, is easy for anyone with basic programming knowledge to maintain and modify, and just never seems to get in the way. If Maven requires more domain-specific knowledge, skill, and time to maintain than a hard-coded build script (or Ant script) would, is it buying us anything?

No tool is the right answer for all problems, and most tools are the right answer for some. Almost any tool can be extended, with sufficient skill and time, to do anything. That doesn't make it the right tool.

Moreover, the more smarts that we build into our Maven configuration, the more that we will rely on needing to hire developers with serious Maven chops; I would rather hire developers with serious programming chops. Maven skills do not generalize to other problems; Java, Groovy, Ruby, Perl do.

What I'm saying here is that I think there is a problem in principle with basing a build on a tool that requires deep domain-specific knowledge to use well; that I think Maven is such a tool; that, further, I think even with solid knowledge Maven is based on premises (such as the idea of a SNAPSHOT) that don't model the world well (pre-release output of a CI process is not equivalent to locally-built output of a local change); and that, finally, there is only room in the world for at most one convention-based tool, and we already have more than one.

Put differently, I'm saying that I think even if I knew how to use it well, Maven would not be the right tool.

Javasaurus

Nothing about dinosaurs; apologies to any 6-year-olds I've misled.

There's an interface in the Java libraries called "Runnable", that just packages up the idea of "some code that you might want to run." This is handy when writing algorithms like "do some preparation; then run whatever it is the client wants to run; then do some clean-up." It's a way to hand a series of program steps from one module to another without having to know in advance what those steps are. ("Closures," much debated in Java, are another way of doing this.)

Runnable defines one method, "run". But the "run" method doesn't allow for the possibility of failure. I needed something similar, that was allowed to communicate failure (by throwing an exception). I knew there was something, but what? A search of the likely spots didn't turn up what I was looking for.

It would be really cool if there was a Thesaurus of Java, a tool or a web site that would let me type in "Runnable" and would come back with all the other things that were kind of like "Runnable." In this case the answer, provided by my colleague Hung, is "Callable." Doh.

A similar task that comes up for me a lot is that I've got a This, and I know it's possible to convert it to a That, but I'm not sure how. For instance, let's say I've got an Eclipse IFile, and I want to convert it to an ordinary Java File. The mapping isn't perfect, but basically, given an IFile there is a chain of methods you can call that will either get you the corresponding File or tell you it doesn't exist. But what is that chain of methods?

There's a finite number of methods that take an IFile as an argument (or receiver). There's a finite number of methods that produce a File. So there's a finite, although very large, possible graph between them - for instance, you could imagine calling something like IFile.getURL() and then URL.getFileDescriptor() and then FileDescriptor.getFile(). (I just made those names up, that's not the real answer.)

Most of the paths through the graph will be wrong, and some will be long and some will be short. But you could use the same sort of semantic analysis tools that are used for natural language translation, feeding off existing code (such as the Eclipse code base, in this example), to inform the graph of common versus uncommon pairings. I'd enter my start and end types, and I'd see a view of the top ten paths through the graph of possible method calls connecting them, perhaps even with links to existing code examples where something like that path had been used before.

I tried Googling for "Java Thesaurus" to see if this existed already, but all that comes up is Java code for writing thesauri.

Thursday, January 8, 2009

Build languages

I mentioned a while ago that I thought the world still needed a good language for build tools. One sign that no such language exists is that there is still a 1:1 correspondence between build tools and their languages. There is no such thing as a portable build description, although some tools have a limited ability to import build scripts meant for other tools.

What makes a build language different than other languages? That's worth a long post. But for now just a couple short thoughts:

Builds are special in that they are almost always slow. Building software is, ironically, one of the hardest things that a computer can do. It's common for a build of even a mid-size project to take several hours, and running the acceptance tests can take most of a day.

Also, builds typically have significant side effects. They modify the world. Running a build may cause a public web site to be updated; it may cause gigabytes of new files to be copied to slave machines around the world; it may send emails to thousands of people; it may cause many other dependent projects to become inoperable.

This means that you can't just tweak and re-run if there's a problem. Debugging an intermittent problem can take weeks, instead of hours, and it's typically a very public process.

Most computer languages have no way to express the idea that some steps are slow, or that some steps have side effects, or even that some steps are dangerous. A good build language would do that intrinsically.

Error handling is a difficult part of any program, but it is critical for build programs, because error conditions are very common and can have damaging long-term side effects. So a good build language would make error handling easier. For instance, it should be easy to associate an action with a set of restrictions, like "execute the update step, unless it would result in deleting any existing files." This sort of thing is not impossible in existing languages but it is requires more code than anyone would actually write (or get right).

Whereas most computer programs are designed to be run many times without change - for instance, hundreds of thousands of people will post millions of blog entries before the Blogger.com software has an update - build programs have to change much more often per execution. A build script might be run a few times a day, and updated every few days. So the writing and debugging of the build program, as an activity, are nearly as important as the execution.

So, a good build language should be closely integrated with writing and debugging. Modern computer languages are typically compiled rather than interpreted, meaning that you have to finish writing the program (or at least a complete, self-consistent, self-contained subset of it) before you can begin executing it. The opposite extreme of a compiled language is a command shell, which is an interactive environment that lets a user perform arbitrary commands one at a time; a command shell may support scripting but it does not build up a "program" out of the executed commands.

In between these extremes are interpreted languages, in which there is a program but it runs within an "interpreter" that lets the user run one step of a program before the next step has even been written. In an interpreted language, it's possible to write and execute a program simultaneously: when you finish, you've got a program and you've also got its results. A good build language should be interpreted, not compiled. And the interpreter needs to be able to tell the user what the next step would do, before doing it: sort of like print preview. This is an attribute of the build tool, not the build language, but it places restrictions on the build language.

Thursday, January 1, 2009

I Want Pictures

With all our interesting weather of late, I've been following Cliff Mass' weather blog. I've been fascinated and impressed by the great data visualizations that meteorologists have to work with. They have a head start because their data naturally maps onto a two-dimensional plot, but they've managed to add many more dimensions in ways that even a non-meteorologist can quickly comprehend. For instance, look at this image, which is the first image from his blog post of 1/1/09:


In addition to the two dimensions of space, and the overlay of geopolitical boundaries, this image shows sea-level pressure, temperature, and the vector of wind speed and direction. And it's just plain pretty to look at, too. I'd love to have a job that involved looking at pictures like that all day.

What would the software equivalent be, I wonder? Could I combine profiler data with a dynamic class diagram from UML? What if I overlaid a metric of function complexity on top of that?

The visualization tools for software are pretty weak, when you consider that all the information is already in the computer (we don't need weather satellites to get our data). It might be because software is an abstraction that doesn't easily lend itself to a 2-D layout like weather data does, but I think it might also be that software engineers are by nature less visually oriented. I think I'm more of a visual thinker than most, but not all, of the developers I've worked with. I'm not really comfortable with something until I can draw a picture of it.