Thursday, January 8, 2009

Build languages

I mentioned a while ago that I thought the world still needed a good language for build tools. One sign that no such language exists is that there is still a 1:1 correspondence between build tools and their languages. There is no such thing as a portable build description, although some tools have a limited ability to import build scripts meant for other tools.

What makes a build language different than other languages? That's worth a long post. But for now just a couple short thoughts:

Builds are special in that they are almost always slow. Building software is, ironically, one of the hardest things that a computer can do. It's common for a build of even a mid-size project to take several hours, and running the acceptance tests can take most of a day.

Also, builds typically have significant side effects. They modify the world. Running a build may cause a public web site to be updated; it may cause gigabytes of new files to be copied to slave machines around the world; it may send emails to thousands of people; it may cause many other dependent projects to become inoperable.

This means that you can't just tweak and re-run if there's a problem. Debugging an intermittent problem can take weeks, instead of hours, and it's typically a very public process.

Most computer languages have no way to express the idea that some steps are slow, or that some steps have side effects, or even that some steps are dangerous. A good build language would do that intrinsically.

Error handling is a difficult part of any program, but it is critical for build programs, because error conditions are very common and can have damaging long-term side effects. So a good build language would make error handling easier. For instance, it should be easy to associate an action with a set of restrictions, like "execute the update step, unless it would result in deleting any existing files." This sort of thing is not impossible in existing languages but it is requires more code than anyone would actually write (or get right).

Whereas most computer programs are designed to be run many times without change - for instance, hundreds of thousands of people will post millions of blog entries before the Blogger.com software has an update - build programs have to change much more often per execution. A build script might be run a few times a day, and updated every few days. So the writing and debugging of the build program, as an activity, are nearly as important as the execution.

So, a good build language should be closely integrated with writing and debugging. Modern computer languages are typically compiled rather than interpreted, meaning that you have to finish writing the program (or at least a complete, self-consistent, self-contained subset of it) before you can begin executing it. The opposite extreme of a compiled language is a command shell, which is an interactive environment that lets a user perform arbitrary commands one at a time; a command shell may support scripting but it does not build up a "program" out of the executed commands.

In between these extremes are interpreted languages, in which there is a program but it runs within an "interpreter" that lets the user run one step of a program before the next step has even been written. In an interpreted language, it's possible to write and execute a program simultaneously: when you finish, you've got a program and you've also got its results. A good build language should be interpreted, not compiled. And the interpreter needs to be able to tell the user what the next step would do, before doing it: sort of like print preview. This is an attribute of the build tool, not the build language, but it places restrictions on the build language.

No comments: