I should mention two things. First, I did science fairs when I was at LatinSchool and I made a point of doing projects about computer science. One ofthe judges one year said, “Have you considered becoming a studentmember of ACM?” I don’t know his name. But I have been very thankfulever since. That was a good thing for me.
And when I got to Harvard, if I had a spare hour to kill in the morning Iwould go over to Lamont Library and I would do one of two things: I wouldread my way backwards through Scientific American and I would read my wayforward, from the beginning, in Communications of the ACM. So I was, inparticular, trying to pick up all of Martin Gardner’s columns onmathematical games. And I just read whatever interested me out of CACM.In 1972 there was only about 15 years of that journal, so it didn’t take thatlong to plow through them all.
Seibel: It also must have been easier then than it would today in the sensethat the same way you could understand whole systems, one person couldhope to understand the whole field.
Steele: Yeah, you could hope to understand the whole field. There werelots of one-page articles. You know: “Here’s a clever new hashingtechnique.” I read a lot.
Seibel: I often find older papers are hard to get into since they’re tied tothe particulars of old hardware or languages.
Steele: Well, necessity is the mother of invention—an idea arises becauseit’s needed in a particular context. Then a little later it’s recognized that thatidea is the important thing. And then you need to strip away the context sothe idea can be seen and that takes some years. “Here’s a clever techniquefor reversing the bits of a word,” and they give something in 7090 assemblylanguage. And there’s an interesting mathematical idea there but theyhaven’t quite abstracted yet.
Seibel: I guess that’s Knuth’s job, right?
Steele: Knuth and people like him, absolutely.
Seibel: Presumably people who study computer science in school getguided through all that stuff. But there are also a lot of programmers whocame into it without formal training, learning on the job. Do you have anyadvice for how to tackle that problem? Where do you start and how do youget to the point where you can actually read these technical papers andunderstand them? Should you start at the beginning of the ACM and try toget up to the present?
Steele: Well, first of all, let me say that that exercise of reading throughCACM from early on wasn’t my plan to become a great computer scientistby reading everything there was in the literature. I read it because I wasinterested in stuff and felt internally motivated to tackle that particular setof material. So I guess there are two things: one is having the internalmotivation to want to read this stuff because you’re interested or becauseyou think it will improve your skills.
Then there is the problem of how do you find the good stuff? And of coursethe view of what is the good stuff changes from decade to decade. Stuff thatwas considered the really good stuff this year may be kind of dated in tenyears. But I guess you go to a mentor who’s been through it and say, whatdo you think was the good stuff? For me the good stuff was Knuth; Aho,Hopcroft, and Ullman. Gerald Weinberg on The Psychology of ComputerProgramming, which I think is still very readable today. Fred Brooks’sMythical Man-Month gave me some insights.
In those days I haunted the computer-science book section of the MITbookstore and just made a point of going through there once a month andbrowsing through the bookshelves. Of course now you walk into abookstore and there’s a computer section that’s ten times as big, but mostof it is about how to do C or Java this year. But there will be a smallersection of books about the theoretical background, algorithms, that kind ofthing.
Seibel: There’s another kind of reading, which I know you think isimportant—reading code. How do you find your way into a big pile of codeyou didn’t write?
Steele: If it’s a piece of software that I know how to use and just don’tknow how the insides work, I will often pick a particular command orinteraction and trace it through.
Seibel: The execution path?
Steele: Yes. So if I were walking up to Emacs, I’d say, “OK, let’s take a lookat the code that does ‘forward a character’.” And I won’t completelyunderstand it but at least it’ll introduce me to some data structures it usesand how the buffer is represented. And if I’m lucky I can find a place whereit adds one. And then once I’ve understood that, then I’ll try “backwards acharacter.” “Kill a line.” And work my way up through more and morecomplicated uses and interactions until I feel that I’ve traced my waythrough some of the more important parts of the code.
Seibel: And would “tracing” mean looking at the text of the source codeand mentally executing it, or would you fire it up in a debugger and stepthrough it?
Steele: I’ve done it both ways—I’ve done it with a stepping debuggermostly on smaller codes back in the ’70s or ’80s. The problem nowadays isfrom the time a program first fires up until it begins to do anythinginteresting can already be a long initialization process. So perhaps one isbetter off trying to find the main command loop or the central controlroutine and then tracing from there.
Seibel: And once you find that, would you set a break point and then stepfrom there or just do it by mental execution?
Steele: I’d be inclined to do it by desk-checking—by actually reading thecode and thinking about what it does. If I really need to understand thewhole code then at some point I might sit down and try to read my way allthe way through it. But you can’t do that at first until you’ve got some kindof framework in your head about how the thing is organized. Now, if you’relucky, the programmer actually left some documentation behind or namedthings well or left things in the right order in the file so you actually can sortof read it through.
Seibel: So what is the right order in the file?
Steele: That’s a very good question. It strikes me that one of the problemsof a programming language like Pascal was that because it was designed for aone-pass compiler, the order of the routines in the file tended to bebottom-up because you had to define routines before you use them. As aresult, the best way to read a Pascal program was actually backwardsbecause that would give you the top-down view of the program. Now thatthings are more free-form, you really can’t count on anything other than theprogrammer’s good taste in trying to lay things out in a way that might behelpful to you. On the third hand, now that we’ve got good IDEs that canhelp you with cross-referencing, maybe the linear order of the programdoesn’t matter so much.
On the fourth hand, one reason I don’t like IDEs quite so much is that theycan make it hard to know when you’ve actually seen everything. Walkingaround in a graph, it’s hard to know you’ve touched all the parts. Whereas ifyou’ve got some linear order, it’s guaranteed to take you througheverything.
Seibel: So when you write code these days would you present more of atop-down organization with the high-level functions before the lower-levelfunctions on which they depend?
Steele: I’d try to present the high-level ideas. The best way to present thatmight be to show a central command-and-control routine with the things itdispatches to beneath it. Or, it might be that the important thing is to showthe data structures first, or the more important data structures. The pointis to present the ideas in an order such that they tell a story rather than justbeing a pile of code thrown together.