Seibel: So do you ever write a program that you know is correct butsomehow falls outside the bounds of the type checker?
Peyton Jones: This comes up when you’re doing generic programming,where you want to write functions that will take data of any type and justwalk over it and serialize it, say. So that’s a time when types can be a bitawkward and an untyped language is particularly straightforward. It couldn’tbe easier to write a serializer in an untyped language.
Now there’s a small cottage industry of people describing clever typed waysof writing generic programs. I think such things are fascinating. But it’ssomehow just isn’t as simple as writing it in a dynamically typed language.I’m trying to persuade John Hughes to write a paper for the Journal ofFunctional Programming on why static typing is bad. Because I think it wouldbe very interesting to have a paper from John, who’s a mainstream, stronglytyped, very sophisticated functional programmer, who is now doing a lot ofwork in untyped Erlang, saying why static types are bad. I think he wouldwrite a very reflective and interesting paper. I don’t know quite where we’llend up.
I think I would still say, “Where static typing fits, do it every time becauseit has just fantastic maintenance benefits.” It helps you think about yourprogram; it helps you write it, all that kind of stuff. But the fact that wekeep generating more and more sophisticated type systems is an indication thatwe’re trying to keep pushing the boundary out to say more in the world—to covermore programs. So the story hasn’t finished yet.
The dependently typed programming people would say, “Ultimately the typesystem should be able to express absolutely anything.” But types are funnythings—types are like a very compact specification language. They saysomething about the function but not so much that you can’t fit it in yourhead at one time. So an important thing about a type is it’s kind of crisp. If itgoes on for two pages, then it stops conveying all the information it should.
I think the direction I’d like to see things go is to have still crisp andcompact types which are a little bit weak, precisely so that they can becrisp, along with invariants, perhaps stated in a rather richer language thanthe inferable type system, but which are still amenable to static checking.Something I’m working on in another project is to try to do staticverification for pre- and post-conditions and data-type invariants.
Seibel: Similar to Design by Contract in Eiffel?
Peyton Jones: That’s right. You’d like to be able to write a contract for afunction like, “You give me arguments that are bigger than zero and I’ll giveyou a result that is smaller than zero.”
Seibel: How do you go about designing software?
Peyton Jones: I suppose I would say that usually the dominant problemwhen I’m thinking about writing a program—thinking about writing somenew piece of GHC—is not how to get the idea into code. But it’s rather,what is the idea?
To take an example, at the moment we’re in mid-flight for moving GHC’sback end, the code generation part, to refactor it in a new way. At themoment there’s a step in the compiler that takes essentially a functionallanguage and translates it into C--, which is an imperative language. Andthat’s a pretty big step. It’s called C-- because it’s like a subset of C. But it’sreally meant to be a portable assembly language. And it’s not printed out inASCII—it’s just an internal data type. So this step in the compiler is afunction from a data structure representing a functional program to a datastructure representing an imperative program. How do you make that step?
Well, I have a pretty complicated bit of code that does that at the moment.But a couple of days ago I realized that it could be separated into two parts:first transform it into a dialect of C--, which allows procedure calls—inside aprocedure, you can call a procedure. Then translate that into a sub-languagethat has no calls—only has tail calls.
Then the name of the game is figuring out, just what is the data type? ThisC-- stuff, what is it? It’s a data structure representing an imperativeprogram. And as you make the second step, you walk over the program,looking at each bit, one at a time. So your focus of attention moves downthe control flow, or perhaps back up through the control flow. A good datastructure for representing that is called a “zipper”—which is a very usefulpurely functional data structure for moving the focus around a purelyfunctional data structure.
Norman Ramsey at Harvard found a way to use this for walking around datastructures that represent imperative control flow graphs. So he and I andJohn Dias then spent a while reengineering GHC’s back end to adoptessentially this factored technology. And in doing so making it much moregeneral so we can now use this same back end as a back end for otherlanguages.
A lot of our discussion was essentially at the type level. Norman would say,“Here’s the API,”—by giving a type signature—and I would say, “That looksway complicated, why is it like that?” And he’d explain why and I’d say,“Couldn’t it be simpler this way.” So we spent a lot of time to’ing andfro’ing at the level of describing types.
But a lot of the time it wasn’t really about programming as such—it wasabout, what is the idea? What are we trying to do with this dataflowanalysis, anyway? You try to say in a clear way what this step of the programis meant to do. So we spent quite a lot of time just being clear on what theinputs and the outputs are and working on the data types of the inputs andthe outputs. Just getting the data types right, already you’ve said quite a lotabout what your program does. A surprisingly large amount, in fact.
Seibel: How does thinking about the types relate to actually sitting downand coding? Once you sketch out the types can you sit down and write thecode? Or does the act of writing the code feed back into yourunderstanding of the types?
Peyton Jones: Oh, more the latter, yes. I’ll start writing type signaturesinto a file right away. Actually I’ll probably start writing some code thatmanipulates values of those types. Then I’ll go back and change the datatypes. It’s not a two-stage process where we say, “Now I’ve done the types,I can write the code.”
If anything I’m a bit ill-disciplined about this. This comes from not workingas part of a large team. You can do things when you’re working on codethat one person can still get their head around that you probably couldn’t ina much bigger team.
Seibel: You mentioned that in this latest code upheaval in GHC, things gotmuch more general. GHC is a big program that’s evolved over time soyou’ve had the chance to benefit from generality and the chance to pay thecost of over-generality. Have you learned anything about how to strike thebalance between over- and under- generalization?
Peyton Jones: I think my default is not to write something very general tobegin with. So I try to make my programs as beautiful as I can but notnecessarily as general as I can. There’s a difference. I try to write code thatwill do the task at hand in a way that’s as clear and perspicuous as I canmake it. Only when I’ve found myself writing essentially the same codemore than once, then I’ll think, “Oh, let’s just do it once, passing some extraarguments to parameterize it over the bits that are different between thetwo.”
Seibel: What is your actual programming environment? What tools do youuse?
Peyton Jones: Oh, terribly primitive. I just sit there with Emacs andcompile with GHC. That’s just about it. There are profiling tools that comewith our compiler and people often use those for profiling Haskellprograms. And we do that for profiling the compiler itself. GHC dumps a lotof intermediate output so I can see what’s going on.