read


read
or learn more

10 Technical Papers Every Programmer Should Read (At Least Twice)

Sep 8, 2011

this is the second entry in a series on programmer enrichment

Inspired by a fabulous post by Michael Feathers along a similar vein, I’ve composed this post as a sequel to the original. That is, while I agree almost wholly with Mr. Feather’s1 choices, I tend to think that his choices are design-oriented2 and/or philosophical. In no way, do I disparage that approach, instead I think that there is room for another list that is more technical in nature, but the question remains, where to go next? In this post I will offer some guidance based on my own readings. The papers chosen herein are not intended to act as a C.S. hall of fame, but instead hope to accomplish the following:

  • All papers are freely available online (i.e. not pay-walled)
  • They are technical (at times highly so)
  • They cover a wide-range of topics
  • The form the basis of knowledge that every great programmer should know, and may already

Because of these constraints I will have missed some great papers, but for the most part I think this list is solid. Please feel free to disagree and offer alternatives in the comments.

A Visionary Flood of Alcohol

Fundamental Concepts in Programming Languages (link to paper)

by Christopher Strachey

Quite possibly the most influential set of lecture notes in the history of computer science. Left and Right-values, Parametric and Ad-hoc polymorphism were all defined in this paper. Much of the content may already occupy your mind, but the sheer weight of the heady topics assembled in one place is stunning to observe.

Why Functional Programming Matters (link to paper)

by John Hughes

I found this paper extremely lucid on the advantages of functional programming with the added advantage of showing off examples of beautiful code. There are seemingly an infinite number of papers on the topic of laziness with streams and generators, but I’ve yet to find a better treatment. Finally, I’ve always been partial to Reginald Braithwaite’s “Why Why Functional Programming Matters Matters” as a complement to this paper.

An Axiomatic Basis for Computer Programming (link to paper)

by C. A. R. HOARE

I came to this paper late in my career, but when I finally found it I felt like I had been hit by a bus. At the core of the paper lies the following assertion:

P {Q} R

Taken to mean:

If the assertion P is true before initiation of a program Q, then the assertion R will be true on its completion

Where P is a precondition, Q is the execution of a program, and R is the result.

In other words, as long as a program/function/method/etc. receives a set of parameters conforming to its preconditions, its execution is guaranteed to produce a well-formed result. This paper inspired me to explore contracts programming in Clojure, but the proof implications reached in Hoare’s paper run much deeper.

Time, Clocks, and the Ordering of Events in a Distributed System (link to paper)

by Leslie Lamport (1978)

Lamport has been highly influential in the field of distributed computation for a very long time and almost any of his papers on the subject should impress. However, this particular paper is likely his most influential and single-handed defined two branches of study in distributed computing since:

  1. The reasoning of event ordering in distributed systems and protocols
  2. The state machine approach to redundancy

The most amazing aspect of this paper is that after you read it you might think to yourself, “Well, of course that’s how it should work.” Jim Gray once said that this paper was both obvious and brilliant. I would say that there is no higher compliment.

On Understanding Types, Data Abstraction, and Polymorphism (link to paper)

by Luca Cardelli and Peter Wegner

I had originally thought to list Milner’s A Theory of Type Polymorphism in Programming, but thought that a survey paper would be better. I must admit that my own readings have not gone deep into the exploration of type systems, so any additional suggestions would be greatly appreciated.

Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I (link to paper)

by John McCarthy

It’s become a cliche to recommend McCarthy’s seminal paper introducing LISP. I will not count this toward the target of 10, but I would be remiss to excluse it because it’s a great read that is nicely supplemented with the study of a simple implementation of McCarthy’s original specification.3

The Machinery for Change

Predicate Dispatch: A Unified Theory of Dispatch (link to paper)

by Michael Ernst, Craig Kaplan, and Craig Chambers

Describes a method for dispatching functions based not on a static set of rules, but instead as the traversal of a decision tree that could be built at compile-time and extended incrementally at runtime. What this means is that dispatch is controlled and adapted based on an open set of conditions describing the rules of dispatch. This stands opposed to the current popular trend of languages whose dispatch is hard-coded and not open for extension at all.

Equal Rights for Functional Objects or, The More Things Change, The More They Are the Same (link to paper)

by Henry G. Baker

At the heart of Clojure and ClojureScript’s implementation is #equiv that is in turn based off of Henry Baker’s egal operator introduced in this paper. Briefly, equality in Clojure is defined by equality of value, which is facilitated by pervasive immutability. Equality in the presence of mutability has no meaning.

Organizing Programs Without Classes (link to paper)

by David Ungar, Craig Chambers, Bay-wei Chang, and Urs Hölzle

The greatest crime perpetrated in the name of JavaScript is the propensity for every framework, library, and trifle uses the prototypal inheritance capabilities of the language to implement class-based inheritance. I propose that this behavior stunts the power of JavaScript. However, the class-based mentality is pervasive, and is only likely to grow stronger as JavaScript moves toward “modernized” data-modeling techniques. Having said that, I love the prototypal model. It’s flexibility and simplicity is astounding, and this paper4 will show how it can be leveraged for practical purposes. While a design oriented paper, I think that the knowledge is contrary enough to pop-programming to warrant inclusion. Self is a fascinating language on its own merit, but especially in that its influence5 on modern dynamic languages is growing ever more pervassive.

I’ve Seen the Future, Brother: It is Murder

Dynamo: Amazon’s Highly Available Key-value Store6 (link to paper)

by Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels

It’s rare for a paper describing a system in active production to influence the state of research in any industry, and especially so in computing. Papers describing thought-stuff are pure and elegant while “real-world” systems tend to be ugly, hackish, and brutish, even if they are rock-solid otherwise. The case of Dynamo is quite different. That is, the system itself is based on simple principles and solves a hard problem, highly available and fault-tolerant online database storage, in an elegant way. Dynamo was not a new idea, but this paper is necessity as we move forward into the age of Big Data.

Out of the Tar Pit (link to paper)

by Ben Moseley and Peter Marks

Now we reach my favorite paper of the bunch – one that I try to read and absorb every 6 months (give or take). The gist is that the primary sources of complexity in our programs are caused by mutable state. With that as the premise, the authors build the idea of “functional relational programming” that espouses minimizing mutable state, moving whatever remains into relations, and then manipulating said relations using a declarative programming language. Simple right? Well, yes it is simple; and that’s what makes it so difficult.

This list should be a good start, but where to go next? My personal approach is summarized simply as: follow the bibliographies. If you like any of these papers then look at their bibliographies for other papers that sound interesting and read those too. Likewise, you can use services like Citeseer and the ACM Digital Library to backtrace citations.

Happy reading.

:F


  1. Apart from a spectacular ear for music, Mr. Feathers is also a fount of wisdom, including this gem from the linked post:

    When I first started writing, one of the pieces of advice that I heard was that you should always imagine that you are writing to a particular person.

  2. Design is a vastly overloaded term, but it’s the best word that I can think of. Suggestions for something better? 

  3. A have some ideas for a Lisp-centric essential papers list also, but have not yet formalized the content. 

  4. It was difficult picking a paper from the treasure-trove that is the comprehensive list of Self publications. These papers represent the vanguard of performance in dynamic languages. 

  5. Although Self does not hold a monopoly on dynamic performance revolutions. Smalltalk implementations have also driven innovation in said space, and a taste for this influence is found in Efficient Implementation of the Smalltalk-80 System by Peter Deutsch and The Design and Evaluation of a High Performance Smalltalk System by David Ungar. 

  6. The Dynamo paper is probably the most controversial choice for this list, so if it bothers you then perhaps Software Transactional Memory by Nir Shavit and Dan Touitou would suffice as an alternative. I was bouncing back and forth between these choices and only settled on Dynamo in the spirit of controversy. ;-) 

33 Comments, Comment or Ping

  1. Bootvis

    The Cohen references make this list even better ;)

  2. Chris

    Please please change your ipad theme. It’s horribly unusable. Slows the device to a crawl and breaks navigation.

  3. dml

    Perhaps the first time I’ve ever seen Leonard Cohen invoked in a computer science context.

  4. Jon

    As a programmer and devoted Leonard Cohen fan, I approve of this article!

  5. Joseph

    Thanks for finding good free links for these!

  6. The greatest crime perpetrated in the name of JavaScript is the propensity for every framework, library, and trifle uses the prototypal inheritance capabilities of the language to implement class-based inheritance.

    I believe that this occurred primarily because Object.prototype.clone in the Self/Io sense wasn’t in the language cloned by MS in IE. Object.create could have worked if it was usable in 2005 but as part of v5 it’s too little too late.

  7. The Cohen references made me chuckle. If you’re a fan checkout San Francisco’s Conspiracy of Beards, a LC a Capella group.

  8. Luis

    I can’t help to notice your Leonard Cohen citation, good articles and I agree that every programmer should at least be aware of those topics.

  9. @Karl

    Thanks for the info. I look forward to exploring that further.

  10. Thank you for a great list! For those interested, I have created an archive containing all the papers above as PDF. You can find it here:

    http://c.wunki.org/A2s0

  11. Good …. keep posting such stuffs

  12. Nice list. But 10? Me counts 11…

  13. Pencilpenpen

    Wunki, thanks for putting them all together!

  14. Su-Shee

    I suggest adding the orginal “Model View Controller” paper to your list (considering how often it gets thrown around in all kinds of web-related articles..)

    http://st-www.cs.illinois.edu/users/smarch/st-docs/mvc.html (updated version)

  15. @Su-Shee

    Thank you for the link. It’s been a very long time since I’ve read that paper, so I can’t wait to read the updated version.

  16. Den

    Best compilation of non-practical papers that pragmatic programmers should avoid!

  17. @Den

    That’s the spirit — you tell me!

  18. SI Hayakawa

    Re: “It’s flexibility and simplicity is astounding, and this paper will show how it can be leveraged for practical purposes.”

    Should be “Its flexibility and simplicity are astounding…” — “Its” (possessive) and “are” (plural).

  19. Re Lisp-based papers, the Lambda the Ultimate series is pretty important and well worth reading.

    Also Richard Gabriel’s Worse is Better paper, and Alan Kay’s History of Smalltalk (both maybe more historical than belongs on the list, but well worth reading anyway).

  20. Darren Grant

    The link to Time, Clocks, and the Ordering of Events in a Distributed System should be http://research.microsoft.com/en-us/um/people/lamport/pubs/time-clocks.pdf

  21. Bob Foster

    Nonsense. A programmer should read everything Robert Tarjan ever published and disregard the rest.

  22. Scott Hunter

    Gries has articles, as well as entire book, on the Hoare work. I’ve always found that the idea of loop invariants to be quite powerful and useful (if only to make explicit the idea that SOMETHING about a loop has to be unchanging), and the way this technique can be used to tease out algorithmic efficiencies always makes for a good demo.

  23. cristian

    I never would have thought Cohen references would find their way in a programming article. As an avid fan I am envious I didn’t think of this first! I shall try to make all of my code comments reference Cohen’s work, because code is poetry, right?

  24. Two other good papers from Google research:

    The Chubby Lock Service for Loosely-Coupled Distributed Systems http://research.google.com/archive/chubby.html

    Bigtable: A Distributed Storage System for Structured Data http://research.google.com/archive/bigtable.html

  25. Alia Henderson

    Although one programmer has the necessary skills and knowledge to work competently on a problem or even create a program, he or she can only do so much. Creating the source code for an operating system, for example, will require thousands of manhours from a single programmer and most probably, he or she will only be halfway through. There just isn’t enough time for one or even two programmers to work effectively to produce a usable program.”..

  26. model aircraft

    Having read this I thought it was extremely enlightening.

    I appreciate you finding the time and energy to put this information together. I once again find myself personally spending way too much time both reading and leaving comments. But so what, it was still worth it!

  27. I recommend you replace Cardelli-Wegner with Cardelli’s tech report on Typeful Programming. Nobody reads CW anymore in the types/PL community.

  28. Christopher D. Walborn

    It looks like your link to Michael Feather’s list needs to be updated. I found this: http://michaelfeathers.tumblr.com/post/81489281/10-papers-every-programmer-should-read-at-least-twice

  29. Don

    Nice list. I have some reading to do now. A couple of surprises here!

    I totally expected Goldberg’s “What Every Computer Scientist Should Know About Floating-Point Arithmetic”. Or is that too cliche? :-)

    And the Dynamo one looks OK, though I was a little surprised it wasn’t the paper about HP’s Dynamo project, which was more innovative. But maybe that’s just because I think virtual machines are more interesting than key-value stores.

  30. Keith Fullerton

    In my 35+ years of real time programming, I found that recursion was fun to work with but better relegated to students and hobbyists. Due to its complicated structure, it is not a good choice for production code. In fact, I found that in general, simpler code worked faster and better (fewer bugs) and was far easier to maintain.

  31. recursion was fun to work with but better relegated to students and hobbyists.

    … and working Erlang programmers.

  32. @KeithFullerton

    I’m not sure where recursion fits into this post, but that’s OK. I’ll just say that I disagree.

    BTW, I love your ambient music albums!

  33. mcwumbly

    Link to the 2nd paper is broken because it used to have a typo. New link is here:

    https://github.com/papers-we-love/papers-we-love/blob/master/functional_programming/why-functional-programming-matters.pdf

Reply to “10 Technical Papers Every Programmer Should Read (At Least Twice)”