Wednesday, October 8, 2008

Hibernate locking

To continue the last topic: I'm working on a program that lets examinations be taken online. It uses Hibernate to get stuff from a database - for example, we get a list of possible answers to an exam question. The list is implemented by Hibernate, so that it doesn't actually read the database until the first time someone tries to access the contents of the list. But that fact is invisible to our code - to us it just looks like an ordinary Java list.

Now, like the ordinary Java collections, the Hibernate-provided collections don't have any built-in synchronization. If you tried to read the list from two threads at a time, it might try to initialize the collection twice, or not at all, or it might just crash.

This is not a problem for most uses of Hibernate. Generally any given Hibernate collection is only accessed by one execution thread, even though there might be zillions of Hibernate collections all referencing the same table in the database.

But in our application, we use the same collection from a lot of threads - perhaps tens of thousands, distributed over a cluster of machines with Terracotta. We never modify it, we just read it. Except for the very first time it's accessed. But there's no way to tell, from the outside, whether it's the first time - so we have to treat it as if it might be, every time.

This pattern does not work well. We could wrap the collections inside "synchronized" blocks, like the java.util.Collections$SynchronizedWhatever classes do; but that means that every time any thread tries to read an entry in the list, it has to wait for every other thread to get out of the way first, just because once upon a time one of those threads did the initialization.

Like I said in the last entry: replacing the implementation without changing the interface is powerful, but it means there's no way to know whether an operation is actually a read or a write. Locking is one reason why a caller cares about implementation.

The solution is to change the Hibernate code so that it does its own locking, using a read/write lock. Within a single method, it can take a read lock to figure out whether initialization is needed; if it is, then it gives up the read lock and takes a write lock to do the initialization. The first time through, a write lock will be taken, but (nearly) every time thereafter, it'll only need a read lock, which means that no thread will ever have to wait. In practice this is very effective. In one performance test we saw a roughly 200x speedup: latencies went from 4 seconds to 20 milliseconds.

The unfortunate part is that the code gets messier. This nice code:

int size() {
if (!initialized) {
initialize();
initialized = true;
}
return size;
}

Becomes:

int size() {
readLock.lock();
try {
if (initialized) {
return size;
}
} finally {
readLock.unlock();
}
writeLock.lock();
try {
if (!initialized) {
initialize();
initialized = true;
}
return size;
} finally {
writeLock.unlock();
}
}

Amidst all the locking and unlocking and multiple checking, it's hard to see what's actually being done by the method. Which gets back to my first post.

No comments: