This forced me to go 100 percent to a regimen of unit testing. So Iproceeded to construct thorough unit tests for each of my subprocedures.And this turned out to be a very good discipline.
This has influenced the design of Fortress to try to include features thatwould encourage the construction of unit tests. And recording themalongside the program text, rather than in separate files. To that extent weborrowed some ideas of say, Eiffel’s Design by Contract and similar thingswhere you can put preconditions and postconditions on procedures. Thereare places where you can declare test data and unit-test procedures, andthen a test harness will take care of running those whenever you request.
Seibel: Since you just mentioned Design by Contract, how do you useassertions in your own code?
Steele: I have a tendency to drop in assertions, particularly at thebeginnings of procedures and at important points along the way. And whentrying to—maybe “prove” is too strong a word—trying to justify to myselfthe correctness of some code I will often think in terms of an invariant andthen prove that the invariant is maintained. I think that’s a fruitful way tothink about it.
Seibel: How about stepping through code in a debugger? Is that somethingyou’ll do if all else fails?
Steele: It depends on the length of the program. And of course you canhave tools that will help you to skip sections you don’t need to step throughbecause you’re confident that those parts are OK. And of course CommonLisp has this very nice STEP function, which is very helpful. I’ve steppedthrough a lot of Common Lisp code. The ability to skip over particularsubroutines whose details you trust, of course, buys you a lot. Also theability to set traps and say, “I really don’t need to look at this until thisparticular loop has gone around for the seventeenth time.” And there werehardware tools to support that on the PDP-10, which was nice, at least atMIT. They tended to modify their machines in those days, to add features.And there’s a lot to be said for watching the actual execution of code invarious ways.
Seibel: Do you ever try to formally prove your code correct?
Steele: Well, it depends on the code. If I’m writing code with some kind oftricky mathematical invariant I will go for a proof. I wouldn’t dream ofwriting a sorting routine without constructing some kind of invariant andproving it.
Seibel: Peter van der Linden in his book Expert C Programming has a sort ofdismissive chapter about proofs in which he shows a proof of something,but then, ha ha, this proof has a bug in it.
Steele: Yes, indeed. Proofs can also have bugs.
Seibel: Are they at least less likely to have a bug than the code which theyare proving?
Steele: I think so, because you come at it in a different way and you usedifferent tools. You use proofs for the same reason you use data types, orfor the same reason that mountain climbers use ropes. If all is well, youdon’t need them. But they increase the chance of catching it if somethingdoes go wrong.
Seibel: I suppose the really bad case would be if you had a bug in yourprogram and then a compensating bug in your proof. Hopefully that wouldbe rare.
Steele: That can happen. I’m not even sure it’s necessarily rare, becauseyou tend to construct the proof to match the structure of the code. Orconversely, if you’ve got a proof in mind as you’re writing the code, thattends to guide your construction of the program. So you really can’t say thecode and the proof are totally independent, in the probabilistic sense. Butyou can bring different tools and different modes of thought to bear.
In particular, the details of the programming tend to take a local point ofview and the invariants tend to focus on the global point of view. It’s indoing the proof that you cause those two things to interact. You look athow the local steps of the program affect the global invariant you’re tryingto maintain.
One of the most interesting exercises in my career was the time I wasasked to review a paper for CACM by David Gries on proving a garbage collectoralgorithm correct, a parallel garbage collector. Susan Owicki was astudent of Gries’s and she developed some tools for proving parallelprograms correct, and he decided to apply these techniques to a version ofa parallel garbage collector that had been developed by Dijkstra. The wholepiece of code fit on, I think, a half a page. And then the entire rest of thepaper was the proof of correctness.
I cranked through the proof and tried to verify every step for myself. Andwhat makes it tricky is, in effect, every statement in the program has thepotential to violate any invariant because it’s a parallel program. So theOwicki technique involves cross-checking these at all points. And it took meabout 25 hours to go through, and in the process I found a couple of steps Icouldn’t push through. So I reported this and it turned out they didrepresent bugs in the algorithm.
Seibel: So they were bugs in the algorithm which the proof missed sincethe result of the proof was, Q.E.D. this thing works.
Steele: Yes, the proof presented was a faulty proof. Because something hadbeen overlooked somewhere. It was some detail of formula manipulation—theformula was almost right but not quite. And I think it was a matter ofcorrecting the order—the sequence of two statements or something.
Seibel: So it took you 25 hours to analyze the proof. Could you have foundthe bug in the code in just 25 hours if you just had the code?
Steele: I doubt I would have even realized there was a bug. The algorithmwas sufficiently intricate that I would probably have stared at the code andsaid, “Yeah, this makes sense to me” and not have spotted this very obscureinteraction. It was a multistep sequence that was necessary—a highlyunlikely interaction.
Seibel: And so that kind of interaction basically gets abstracted by theprocess of making the proof so you don’t have to come up with thisscenario of what if this happens and then this and then this to realize thatthere’s a problem.
Steele: Exactly. In effect the proof takes the global point of view and coversall the possibilities, summarizing it in a very complicated formula. And youhave to do formula crunching to push it through. So the author resubmittedthe paper and it came back for rereviewing and even though I had done theentire exercise, it took me another 25 hours to reverify the proof. Thistime it all seemed to be sound.
I reported that and the paper was published and nobody’s found a bug in itsince. Is it actually bug-free? I don’t know. But I think having gone throughthe exercise of the proof gives me a lot more confidence that the algorithmis now sound. And I’m hoping that I wasn’t the only reviewer who actuallydid the complete cranking through of the proof.
Seibel: There’s a Dijkstra quote about how you can’t prove by testing thata program is bug-free, you can only prove that you failed to find any bugswith your tests. But it sort of sounds the same way with a proof—you can’tprove a program is bug-free with a proof—you can only prove that, as far asyou understand your own proof, it hasn’t turned up any bugs.
Steele: That’s true. Which is why there is a subspecialty in the disciplinehaving to do with mechanical proof verification. And the hope is that youthen reduce the problem to proving that the proof verifier is correct.Which is—if you can write a small enough verifier—actually a much moretractable problem that verifying the proof of any rather large program.
Seibel: And then the manually proved mechanical verifier would do the 25hours of work you did of grinding out a verification of a specific proof ofsome other piece of code?