Thursday, October 23, 2008

Synchronization defines structure

Consider the following code snippets, from some Aspectwerkz code:

public synchronized ClassInfo[] getInterfaces() {
if (m_interfaces == null) {
m_interfaces = new ClassInfo[interfaces.length];
// etc.
}
return m_interfaces;
}

/**
* Returns the super class.
*
* @return the super class
*/
public ClassInfo getSuperclass() {
if (m_superClass == null) {
m_superClass = m_classInfoRepository.getClassInfo(superclass.getName());
// etc.
}
return m_superClass;
}


Notice that the first method is synchronized, and the second is not. How come? Is this a bug in Aspectwerkz? Both methods seem to require synchronization, because the "check something and then conditionally change it" pattern is otherwise unsafe.

My inclination is to say that it's just a bug. But it might not be; there might be some external reason that the second method does not need to be synchronized here. For instance, it might always be called from within another synchronized block (though the fact that it's got public access scope makes this hard to enforce).

The point here is that synchronization (almost by definition) implies a particular call structure: to correctly synchronize a particular body of data, you need to know how that data will be accessed, by whom, in what possible sequences of events. You can't just put the "synchronized" keyword in front of every method, because over-synchronization leads to deadlock; you can't just synchronize every method that changes state, because you won't get the right visibility guarantees. You have to actually know what the code is doing, to correctly synchronize it.

This is a huge problem for two reasons. First, while you're coding, you're changing structure, so it's hard to keep up; thus, synchronization bugs creep in. In the above example, it's possible that the second method was originally private (and always known to be called from within some other synchronized block), and then someone changed it to be public without updating the synchronization. Second, it makes it much harder to change code locally: you have to understand the overall behavior of the code in more detail than would otherwise be needed.

Which brings me to the main point: unlike a lot of code, synchronization is not self-documenting. It is simply too fragile and opaque. I cannot look at the above code and figure out what pattern the developer had in mind, what assumptions s/he was making. When maintaining code, I want to preserve assumptions or else systematically and thoroughly change them. I can't do that if I can't even discern them.

As a side note, isn't that Javadoc something special? Really, "returns the superclass" is easy to figure out from the method name. What I need to know from this method's documentation are things like "under what circumstances might it return a null?" and "are there initialization requirements before this method can safely be called?".

No comments: