It came back and it was about as close to a design document as we gotexcept that it had homonyms that you wouldn’t believe. So I went off andimplemented this file system, strictly on a PDP-7. At some point I decidedthat I had to test it. So I wrote some load-generating stuff. But I was havingtrouble writing programs to drive the file system. You want somethinginteractive.
Seibel: And you just wanted to play around with writing a file system? Atthat point you weren’t planning to write an OS?
Thompson: No, it was just a file system.
Seibel: So you basically wrote an OS so you’d have a better environmentto test your file system.
Thompson: Yes. Halfway through there that I realized it was a real timesharingsystem. I was writing the shell to drive the file system. And then Iwas writing a couple other programs that drove the file system. And rightabout there I said, “All I need is an editor and I’ve got an operating system.”
Seibel: What’s the worst bug you’ve ever had to track down?
Thompson: Basically bugs where memory gets corrupted. It never happensanymore. I don’t know why. But in the early days we were always workingwith experimental hardware of various sorts, and there’d be some hardwarebug.
Seibel: So memory would get corrupted by the hardware screwing up, notby a runaway pointer?
Thompson: It could be pointer. It could be hardware. Or a combination.The one I’m actually thinking of, the worst example, was the on PDP-11. Itdidn’t have multiply but you could buy a multiply unit and plug it in, but itwas an I/O peripheral. You would store a numerator and a denominatorand say go. You’d busy-loop and then pull out the answer, the quotient andthe remainder. And this thing was built for a non-memory-managed PDP-11and we got the first experimental hardware for a memory-managed PDP-11and this multiply unit didn’t fit with the memory management well.
So you’d store into this thing and then you’d busy-test. And during certainaspects of the busy test it would send a physical address down instead of avirtual address and some piece of memory would get clobbered with anumerator of what you were trying to divide by. And it’d be ages beforeyou’d find it, and it’d be different places. That’s by far the hardest one I’dever had to find.
Seibel: How did you track it down?
Thompson: There was a program that I wrote that was going after a worldrecord for the number of digits of e. Previous world records were limitednot by computation—by cycles per second—but by I/O. I came up with anew algorithm where it was computation-bound and I/O became trivial. Itwas monstrously heavy on multiply and divide. And we noticed that themachine just crumbled whenever I put on my program. And therefore wegot the connection.
Seibel: So that gave you the clue that there was a problem with themultiplier; did you ultimately track it down to some root cause?
Thompson: At some point we got it to where you store in the multiplierin the multiply unit, and you pull it back and it wasn’t there. We reportedthat to DEC and DEC couldn’t find it, and they didn’t want to deal with it.Their normal people didn’t want to deal with this hybrid machine. In thosedays you actually got the circuit diagrams of the machines, and we actuallyfound the bug in the circuit diagrams. Then we just called DEC and said,“Connect that wire and that wire.”
Seibel: So, thankfully, hardware mostly doesn’t flake out on us that waythese days.
Thompson: Yeah. That’s why I think they’re rare. Plus things are isolatedfrom each other more—if you go bizarrely wild you’ll get a fault. Also youdid it in assembly language—it’s really easy to have the wrong thing in someregister through some subroutine call. When you have a high-level languagewhere the arguments all have to match up, these things become more andmore rare.
In the early days, in assembly language, you’d find them a lot. If it wassoftware, as opposed to a combination of software/hardware, usually itwould happen in one spot—the same spot would be corrupted. There’d besome correlation of the bug with something. And you could sit there andput a monitor in the operating system. And every so often, or very often,you’d check and see if the error occurred, and stop as quick as you can, andsee what’s going on elsewhere, and chase them down that way. So youcould attack them.
This one you couldn’t attack. It wasn’t until I wrote this intensivemultiply/divide program that it saw the frequency of the error went way,way up. Instead of crashing once every couple of days you’d crash onceevery couple of minutes. And then as soon as you got something that wouldcrash the machine you had a fighting chance to find it.
Seibel: So some folks today would say, “Well, certainly assembly has allthese opportunities to really corrupt memory through software bugs, but Cis also more prone to that than some other languages.” You can getpointers off into la-la land and you can walk past the ends of arrays. Youdon’t find that at all problematic?
Thompson: No, you get around that with idioms in the language. Somepeople write fragile code and some people write very structurally soundcode, and this is a condition of people. I think in almost any language youcan write fragile code. My definition of fragile code is, suppose you want toadd a feature—good code, there’s one place where you add that featureand it fits; fragile code, you’ve got to touch ten places.
Seibel: So when there’s a security breach that turns out to be due to abuffer overflow, what do you say to the criticism that C and C++ are partlyresponsible—that if people would use a language that checked array boundsor had garbage collection, they’d avoid a lot of these kinds of problems?
Thompson: Bugs are bugs. You write code with bugs because you do. Ifit’s a safe language in the sense of run-time-safe, the operating systemcrashes instead of doing a buffer overflow in a way that’s exploitable. Theping of death was the IP stack in the operating system. It seems to me thatthere’d be more pings of death. There wouldn’t be pings of “take over themachine becoming superuser.” There’d be pings of death.
Seibel: But there is a difference between a denial-of-service attack and anexploit where you get root and can then do whatever you want with thebox.
Thompson: But there are two ways to get root—one is to overflow abuffer and the other is to talk the program into doing something it shouldn’tdo. And most of them are the latter, not overflowing a buffer. You canbecome root without overflowing any buffers. So your argument’s just noton. All you’ve got to do is talk su into giving you a shell—the paths are allthere without any run-time errors.
Seibel: OK. Leaving aside whether it results in a crash or an exploit orwhatever else—there is a class of bugs that happen in C, and C++ for thesame reason, that wouldn’t happen in, say, Java. So for certain kinds ofapplications, is the advantage that you get from allowing that class of bugsreally worth the pain that it causes?
Thompson: I think that class is actually a minority of the problems.Certainly every time I’ve written one of these non-compare subroutinecalls, strcpy and stuff like that, I know that I’m writing a bug. And Isomehow take the economic decision of whether the bug is worth the extraarguments. Usually now I routinely write it out. But there’s a semanticproblem that if you truncate a string and you use the truncated string areyou getting into another problem. The bug is still there—it just hasn’toverflown the buffer.
Seibel: When you’re debugging, what tools do you use?
Thompson: Mostly I just print values. When I’m developing a program I doa tremendous amount of printing. And by the time I take out, or commentout, the prints it really is pretty solid. I rarely have to go back.