Wednesday, March 11, 2009

Sharing Data

I recently discovered Joe Armstrong's post "Why OO Sucks". I didn't agree with much of what he said, but I was struck by one of his claims, "functions and data structures belong in totally different worlds." This is of course the antithesis of the OO (object-oriented) programming philosophy, which holds that you should lump data together with the sets of rules and actions that apply to it.

I disagree. Data is only useful if it has integrity, and it only has integrity if there are rules that govern how it is read and changed. "Rules" is just another word for "code", so this argues that code should be tightly associated with data.

An "object" in software jargon is just a way to expose data while still wrapping it in a decent amount of clothing. Within an application, objects make it safer to work with data, because you can ensure that no matter what you do the data is still valid.

The problem with objects is that they're not easily shared between applications. They're very ephemeral; they live only as long as they're contained in a running program, and they're tightly coupled to all the details of that particular program. In Java every object is an instance of a particular class, and every class is associated with the classloader that produced it, and classloaders in turn are associated with a single instance of a single application. If you try to put an object into a different application, it looks around for its classloader and, not finding one it recognizes, gets scared and shy.

The usual solution to this is to convert the object into raw data (often some sort of text representation, like XML) for long enough to transport it to a different application, and then in that application a new object is created by reading in the raw data and associating it with a hopefully compatible class from a hopefully compatible classloader. This is slow, expensive, and inaccurate. For instance, there's no way to guarantee that the classes are truly identical, so if an object moves from application A to B and back again, it might come back in an illegal state. Also, it often requires the programmer to write a lot of code to spell out how to read and write the object.

Terracotta is a way to spread the work of a software application across a large number of computers. We allow objects to move freely from one computer to the next. Under the covers we do still convert the objects to raw data and back, but we do it in a way that is quite efficient and transparent to the application programmer. We're great at moving objects from computer to computer in a single application. Up until the latest release, however, we weren't much good at moving objects between different applications. Now we are.

The basic idea here is that even though the classloaders for two different applications are different, as long as they both contain a definition of the class being shared (and the other classes that it in turn needs to access), that's good enough. The computer has no way of knowing whether that's true; but the programmer does. So, we let the programmer tell Terracotta which applications are allowed to share classes with each other. The configuration feature is called "app-groups", and I'd point to the documentation in this post, but it's not up on the web site quite yet. It's quite simple to use; you just define an app-groups element in the Terracotta configuration file, give it a name, and inside it you list all the applications that you want to be able to share objects with each other.

A typical use case would be if you've got a user-facing application and also an administrative application. Imagine, for instance, a merchant site, that lets users build up a shopping cart. Using Terracotta you might avoid storing that shopping cart in your central database, to reduce database load; instead, you'd keep it as transient data, getting session scalability and server failover from the Terracotta system instead. But suppose you want to let a sales agent view a customer's shopping cart, to make recommendations or fix problems. How can you share the transient shopping cart data between the customer-facing application and the agent-facing administrative application? One idea is to keep the list of shopping carts as a shared root in both applications, and then place both applications in the same app-group with the Terracotta configuration. No database required; transient data is still transient; no custom serialization code or data format definitions required. Just transparent sharing of objects between two otherwise different applications that both happen to include the same Java class definitions.

There are still some caveats, of course. One ugly one is that the different applications have to be running in different Java virtual machines. That is, you can't have a single application server instance and deploy both applications to it. That's for internal technical reasons that we hope to eliminate in a future release. For now, you'd have to put the sales-agent application on a separate app server instance (although it could be running on the same physical computer). Another caveat is that you can't have multiple overlapping groups (like, A can share with B and B with C but not A with C), and you can't restrict sharing to only certain objects or roots, it's application-wide. Caveats notwithstanding, I think it's a powerful new feature, and it'll be interesting to see what new uses of Terracotta this enables.

Monday, March 9, 2009

Circus Contraption

As long as we're talking about things I'm proud of, let me mention the new show that I'm doing sound for, Circus Contraption. I did the sound system design and installation, and the overall sound design, and I'm sharing the night-to-night mixing duties with two other sound guys. If you happen to be in Seattle on a Friday, Saturday, or Sunday in the next couple months, try to see the show! It's the real thing. The sword swallower actually swallows the sword, it's not just stage magic.

Doing live sound is very, very different than writing software. You do not get a chance to fix bugs, and you cannot take things slowly or stop to have a design discussion. Every night is different: the performers sing and play softer or louder, there are more or fewer people (read: sound-absorbing sacks of water) in the audience, equipment that worked the last night breaks this night, someone trips over a wire or forgets to turn on their microphone. You do what you can and move on. Frankly, I'd probably be a better software developer if I treated software more like live sound.

Wednesday, March 4, 2009

Terracotta

I write software for Terracotta, which is an open-source company. I love working on open source code, in part because what I do is not a secret - I can tell my geeky friends about the cool problems that I wrestle with. (I also work on Eclipse, another open source project.)

Paradoxically, though, it seems like in the open source world it is often very hard to talk about who my customers are. Partly that's because we don't always know - anyone can download the product for free. But also it's because our paying customers don't always want their competition to know how they succeed, and we of course need to honor their confidentiality.

So I'm really pleased that Terracotta has lately been getting some great press about one of our important customers, Sabre Holdings. They're perhaps most commonly known for one facet of their business, Travelocity. Sabre is huge - according to one article, "On any given day, Sabre's servers have to be able to handle up to half a billion transactions a day and a peak volume that can go up to 32,000 transactions per second."

How do they get that kind of volume, and the reliability that has to go with it? Answer: they run their mission-critical, high-volume stuff on Terracotta, my software. Yes, I'm proud :-)