Ignorance Compounded: software engineering

Showing posts with label software engineering. Show all posts

Thursday, August 29, 2013

What is Quality Engineering?

Back in the 1990s, when waterfall was the development methodology, developers wrote code and checked it in; testers tested code, and found bugs; developers fixed the bugs; and then eventually we released the product to a limited group of customers called beta testers, and finally to the general public.

Nobody works that way any more. Waterfall failed, and Agile took over. Agile does not have a clear role for testers, because there is not a separate testing phase. If you have someone different doing your testing, then you must necessarily be spending some time with code that is (at least partly) checked in but that has not been verified. That is not Agile.

At the company I work for, teams have resolved this dilemma in many ways, most of them lousy. Some teams don't have testers; some have automation developers; some have mini-waterfall.

We have a job title of "Quality Engineer"; people with this job are not expected to implement customer-facing features. The absurd implication is that the people who implement customer-facing features are not quality engineers. A software engineer who is not a quality engineer should be fired. Quality is not something that can be applied after the coding is done.

But testers are important. It's really hard to rigorously test your own code. If you didn't see a gap the first time, you probably won't see it the second time. And writing code is a creative act that takes emotional investment. Asking someone to find the flaws in their own code is like asking a painter to critically assess the artistic relevance of their work before the paint dries on the canvas.

Pair programming is one solution; it's a lot easier to see someone else's error, or challenge someone else's shortcut. Two sets of eyes during coding can greatly improve quality. But the skill set of a good manual tester is different than that of a coder. Watching a good manual tester is like watching a good hacker: the feature you thought was solid gold dissolves into a pile of bugs before your eyes.

So there is still a role for manual testing. QE can understand the product from the customer's perspective, use it, and find out what doesn't work: essentially, act as a high-bandwidth, low-latency customer proxy. QE in this role should be most tightly aligned with the product owner.

But manual testing is low leverage, compared to some more interesting possibilities. There are areas where "Quality Engineering" really becomes a meaningful term. Regrettably few companies invest in these areas. The common characteristic of all these possibilities is that the work is internal-facing, decoupled from the product release cycle, and aimed at the development process rather than the product as such.

Predictive Fault Detection
There is a wealth of academic work, and some commercial products, dedicated to the premise that it is possible to predict before any code has been written where the bugs will be. Bugs are not random: certain design patterns, certain APIs and technologies, certain methodological patterns are inherently buggy. QE should be studying past results to predict future buggy patterns, steering coders away from them where possible, and advising extra attention where necessary. QE should be like a harbor pilot, who knows where the hidden reefs are better than the ship captains can.

When technologies or patterns that are highly likely to provoke bugs are found, QE should propose eliminating them entirely: for example, if the company has been using a particular messaging framework but coders interacting with the framework tend to use it incorrectly and cause bugs, perhaps it is a sign that it is a bad choice of framework, even if it is otherwise performant and cool. Or maybe it can be fixed.

Test Curation
Coders should write the majority of their own tests. But as the codebase grows, so does the body of tests; and the test base becomes redundant and full of low-value tests. Careful unit testing alleviates this problem because the individual tests continue to run quickly; but unit testing relies on well-modularized code, and in many enterprise situations - including at the company I work for - this is a goal that we can work towards but it is not a point we can start from.

So we have a vast number of slow, highly redundant tests, most of which test features that are not likely to regress. QE should monitor the overall test base and combine tests that are too redundant, eliminate tests that provide insufficient value, and identify areas of weak coverage. QE should understand and manage the test base as a whole, where coders tend to interact only with specific tests.

Framework Development
Coders are generally working under time pressure to produce a customer-facing feature. We tend to do whatever reduces our risk of on-time delivery, even if it results in accumulating technical debt. It's often hard just to get a coder to take the extra time to refactor the shared code they are building on top of. Most developers are not in a position where they can tell their boss they're about to spend a few months developing code that will pay off company-wide but that will not directly result in shipping the feature the team is supposed to be working on. As a dev manager, my personnel funding is proportioned on the basis of feature need, not internal investment.

However, the payoff for having a well-maintained set of test frameworks is huge; all the more so when the maintenance isn't just a series of one-off efforts by coders who need a feature, but a proactive, intentionally designed effort by a dedicated team. QE can serve as a pool of engineers whose job is to improve the quality and efficiency of the feature-dedicated coders.

In summary:
The term "Quality Engineer" is nothing but a euphemism, when it's used to make a tester feel important in a development methodology that doesn't have a place for testing. Testing is important, and it doesn't need to be called something other than what it is; but it's entirely different from quality engineering. Quality engineering should be valuable and high leverage, but it can only be so if we take it seriously, separate it from testing, and select quality engineers on the basis of relevant skill, training, and experience.

Friday, October 22, 2010

time-delayed feedback in the workplace

The job of buildmaster rotates amongst managers. The buildmaster is primarily responsible for haranguing developers when the automated test failure rates are too high; and if they are too high for a while, the buildmaster can "lock the line", meaning that the only permitted checkins are those that ostensibly fix tests. We have some test suites that take several days to complete. Thus a bad checkin may cause test results to plunge days after the fact.

In Peter Senge's classic The Fifth Discipline, he talks about the effect of introducing a time delay into a negative feedback system. Whereas negative feedback usually stabilizes a system, negative feedback plus time delay tends to cause ever-more-violent oscillation.

Consider the following actual data:

Test	Current	EOD 10/21	EOD 10/20	EOD 10/19	TARGET
fast_suite	97.55%	98.77%	99.39%	99.39%	98%
slow_suite	86.43%	94.10%	83.61%	95.29%	97.5%

The fast suite returns feedback in a couple hours; the slow suite takes a few days to catch up to a changelist.

I am assured by various people that it sucks to be the buildmaster. It will continue to suck to be the buildmaster, I think, until we devise a system that is stable rather than oscillatory. A stable system is characterized by damping rather than nonlinear gain; and by feedback that is at least an order of magnitude faster than the forward phase response of the system. (It's possible to stabilize systems other ways, but this is the most general and reliable.)

To speed up the feedback loop, we could have fast suites that predict the behavior of the slow suites. Simply choosing a random subset of the tests in the slow suite, running those first, and providing interim results could achieve that.

To have damping rather than nonlinear gain, we need to remove or highly restrict the buildmaster's ability to lock the line; and instead, we need to increase the amount of pre-testing that is required in order to do a checkin. For instance, if interim results indicate a high failure rate, then new checkins should be subjected to a higher level of testing in the precheckin queue before they are allowed to actually commit.

Tuesday, January 12, 2010

Dual dispatch

Computer programming is about describing the behavior of entities in various scenarios. For instance, if you're writing a data entry form, you might need to describe how it behaves when you click a certain button, how it behaves when you type into a field, and so forth. In object-oriented programming, we try to organize the code that describes an object's behavior together with the data that describes its state.

This gets messy when objects interact. Suppose you've got three Animals (Dog, Cat, Monkey), and three kinds of Food (DogChow, CatChow, MonkeyChow). You want your program to ensure that Monkeys can only eat Monkey Chow, Cats can only eat Cat Chow, but Dogs can eat all three kinds. So, you add a method to each of the three Animal classes, something like "boolean canEat(Food chow)". The implementation for Cat looks like "return chow instanceof CatChow", Dog looks like "return true", and so forth.

What happens when you add an animal? No problem, you just have to implement its canEat() method. What about when you add a new type of food? Well, you have to go through all the existing animals and make sure their implementation is still right. For instance, since chocolate is poisonous to dogs, the Dog implementation of canEat() is wrong. No compiler error, but your dog might die.

Or, you could flip the problem around, and put methods on all the Foods, saying what animals can eat it. Now, when you add a food, it's just a matter of implementing its canBeEatenBy(Animal animal) method. But when you add a new animal, you have to check all the Food implementations.

This problem of how to describe the interactions between entities is called "dual dispatch." I know of no particularly good general solution.

Monday, July 13, 2009

Dogfood

The company where I spent my vacation has a very strong tradition of "eating one's own dogfood," which means using the products they are developing. This is supposed to improve the quality of the product, because one becomes painfully aware of the problems and is motivated to fix them.

My recollection is that when I first heard the phrase, back in the early 1990s, it was "tasting," not "eating," one's own dogfood. There is a significant difference between the two.

The problem with eating only your own dogfood is that you start getting used to the taste of dogfood, and you don't discover that the rest of the world has learned how to cook a decent meal.

Maybe it is better to eat whatever the best food around is, while occasionally being forced to taste one's own dogfood. For example, let development teams use whatever products they want, but have test days where the teams work through predefined customer use cases using their own products.

Thursday, February 12, 2009

What should code comments do?

Below I've posted some code I just had to look at. I've got nothing against this code; it's a nice clean class, simple, I'm not aware of any bugs in it.

It's easy to figure out what this code does, just by looking at it. It takes a slash-delimited string ending in "war", like the one in main(), and deletes the third token if it contains only decimal digits.

But WHY? What problem does this class solve? What is Geronimo, why is the string "war" important?

I can't help but think that someone discovered the need for this code the hard way, after time spent looking at Geronimo code or documentation, talking with peers, perhaps after fixing a bug report from the field. All that information has now been lost.

Perhaps the need for this applied only to a particular version of Geronimo. Perhaps it only turns up in a peculiar use case. Perhaps the original developer's understanding was flawed and this code is never actually needed. There's no way to know, and anyone who encounters this code in the future will have to try to figure out how not to break it. Very likely, it actually does do something important but it's not covered in the test suite, and any breakage will be discovered as a regression in the field, when some user tries to update to the latest product version and their application no longer runs.

It's like a post in the middle of the living room: you figure it's probably supporting some weight above, but how do you know? So you can't remodel the room, because the second floor might collapse. But maybe the builder put it there because they were planning on a hot tub on the floor above, where now you've got a walkin closet. Now you've got to hire a structural engineer to do the same calculations again, because the original rationale has been lost.

Well-written code shouldn't need to explain what it does. But it should explain why it does it. What other options were considered? In what situations is the code necessary?


public class GeronimoLoaderNaming {

  public static String adjustName(String name) {
    if (name != null && name.endsWith("war")) {
      String[] parts = name.split("/", -1);
      if (parts.length != 4) { throw new RuntimeException("unknown format: " + name + ", # parts = " + parts.length); }

      if ("war".equals(parts[3]) && parts[2].matches("^\\d+$")) {
        name = name.replaceAll(parts[2], "");
      }
    }

    return name;
  }

  public static void main(String args[]) {
    String name = "Geronimo.default/simplesession/1164587457359/war";
    System.err.println(adjustName(name));
  }

}

Friday, January 30, 2009

Javasaurus

Nothing about dinosaurs; apologies to any 6-year-olds I've misled.

There's an interface in the Java libraries called "Runnable", that just packages up the idea of "some code that you might want to run." This is handy when writing algorithms like "do some preparation; then run whatever it is the client wants to run; then do some clean-up." It's a way to hand a series of program steps from one module to another without having to know in advance what those steps are. ("Closures," much debated in Java, are another way of doing this.)

Runnable defines one method, "run". But the "run" method doesn't allow for the possibility of failure. I needed something similar, that was allowed to communicate failure (by throwing an exception). I knew there was something, but what? A search of the likely spots didn't turn up what I was looking for.

It would be really cool if there was a Thesaurus of Java, a tool or a web site that would let me type in "Runnable" and would come back with all the other things that were kind of like "Runnable." In this case the answer, provided by my colleague Hung, is "Callable." Doh.

A similar task that comes up for me a lot is that I've got a This, and I know it's possible to convert it to a That, but I'm not sure how. For instance, let's say I've got an Eclipse IFile, and I want to convert it to an ordinary Java File. The mapping isn't perfect, but basically, given an IFile there is a chain of methods you can call that will either get you the corresponding File or tell you it doesn't exist. But what is that chain of methods?

There's a finite number of methods that take an IFile as an argument (or receiver). There's a finite number of methods that produce a File. So there's a finite, although very large, possible graph between them - for instance, you could imagine calling something like IFile.getURL() and then URL.getFileDescriptor() and then FileDescriptor.getFile(). (I just made those names up, that's not the real answer.)

Most of the paths through the graph will be wrong, and some will be long and some will be short. But you could use the same sort of semantic analysis tools that are used for natural language translation, feeding off existing code (such as the Eclipse code base, in this example), to inform the graph of common versus uncommon pairings. I'd enter my start and end types, and I'd see a view of the top ten paths through the graph of possible method calls connecting them, perhaps even with links to existing code examples where something like that path had been used before.

I tried Googling for "Java Thesaurus" to see if this existed already, but all that comes up is Java code for writing thesauri.

Thursday, January 1, 2009

I Want Pictures

With all our interesting weather of late, I've been following Cliff Mass' weather blog. I've been fascinated and impressed by the great data visualizations that meteorologists have to work with. They have a head start because their data naturally maps onto a two-dimensional plot, but they've managed to add many more dimensions in ways that even a non-meteorologist can quickly comprehend. For instance, look at this image, which is the first image from his blog post of 1/1/09:

In addition to the two dimensions of space, and the overlay of geopolitical boundaries, this image shows sea-level pressure, temperature, and the vector of wind speed and direction. And it's just plain pretty to look at, too. I'd love to have a job that involved looking at pictures like that all day.

What would the software equivalent be, I wonder? Could I combine profiler data with a dynamic class diagram from UML? What if I overlaid a metric of function complexity on top of that?

The visualization tools for software are pretty weak, when you consider that all the information is already in the computer (we don't need weather satellites to get our data). It might be because software is an abstraction that doesn't easily lend itself to a 2-D layout like weather data does, but I think it might also be that software engineers are by nature less visually oriented. I think I'm more of a visual thinker than most, but not all, of the developers I've worked with. I'm not really comfortable with something until I can draw a picture of it.

Sunday, December 7, 2008

Through A Glass, Darkly

Although I get paid to write software, most of my time is spent understanding other people's software. I find that difficult: the available information is usually fragmentary, inconsistent, and more than I can hold in my head at one time anyway. It's like trying to read a mural on the side of a building, through the holes in a construction fence, as I drive by. I get little snapshots of partial information and I have to try to piece them together in my head. Sometimes the big picture I develop is entirely wrong, or missing big chunks.

Example: I've been working with Hibernate for more than a month now, but I still don't really understand exactly how it actually works. I only discovered tonight that it creates proxies to camouflage and control all my objects. This is sort of like wondering why your friends are acting a bit odd and then discovering that everyone on the planet except you has been replaced by space aliens.

Java has so dang many layers of obscurity that it is really hard to figure out precisely what code is actually being executed. The application code you write is just the tip of the iceberg. What with Spring, Hibernate, Terracotta, JUnit, the Hotspot compiler, and all the other frameworks and code enhancement tools we use, the program code itself is almost just a hint to the system. Maybe we're getting closer to the holy grail of being able to tell the computer what we want done, rather than how to do it.

Ignorance Compounded