We all have our favorite classic comp-sci paper, usually one we find so fascinating that we want to share it with everyone.
Not all may agree it’s a good idea to read the original paper. Some might prefer a modern textbook exposition of it. Nevertheless a more detailed look to our past can be helpful when trying to understand the future, and provides us with a more polished understanding.
Below is a list of “classic” papers that have shaped computing history. Some of which will eventually become classics (e.g. the bitcoin paper). Most were perceived radical at the time, but turned out to influence terminology and became pillars of computer science.
If you find something missing, please post it as a comment (including reason for why you think it’s special and should be read). Thanks!
What do all these papers have in common?
- They laid the foundation for different areas and influenced our thinking about design, architecture and how we solve problems. Heck some of them were written long before computers existed yet are still relevant today.
- The people who wrote them can certainly be considered pioneers in their field.
How can you become a pioneer today? Is there still something like a “non-beaten path” in Comp-Sci? I have thought a lot about this recently and think that in science, books may not always be the best source to make us creative. What,… did I really say that out loud?
Don’t get me wrong. I love a good book as much as the rest of us, but believe that to be unique and creative we have to derive our own ideas and come to conclusions by our own. A book is perfect for getting an additional viewpoint (of “some author”) on top of our established knowledge. But books can also kill the inventor inside us due to their tendency to bias us or nudge us onto some fast path to success (you know the path everyone else who chose this subject as a career also takes). Another fallacy is that books tend to put us into a mind of acceptance and we often assume that the writer did their research. After all we paid more than 20 bucks for it, so it has to be better than a paper which you can download for free and seems so dated, right?
To be fair if you look at whether studying papers will increase your ability to think critical and hence become more creative than studying books, there is also a problem with that. Because the number of citations in scientific papers are as much an indicator of quality as reviews on Amazon. (if not worse). So picking the right paper is even harder IMO (hence I created this list).
Before we start, let me point you to a 2-page paper, “Hanson 1999: Efficient Reading of Papers in Science and Technology“, which shows you how to assess whether a particular text should be worth your time – without having to read it all!
Turing’s 1936 classic, “On computable numbers with an application to the Entscheidungsproblem” [36 pages], is unarguably the paper that began the field of computer science as we understand it today. Here we have the first descriptions of universal computing devices, Turing machines, which eventually led to the idea of universal stored-program digital computers. The paper even seems to describe, in what is unarguably the first ever conceptual programming language, a form of continuation passing style, in the form of the “skeleton tables” Turing used to abbreviate his Turing machine designs.
“A Mathematical Theory of Communication” [55 pages], by Claude Shannon in 1948 was one of the founding works of the field of information theory and laying out the basic elements of communication: information source, transmitter, channel, receiver and destination.
Simon in his 1962 paper “The Architecture of Complexity” [32 pages], was the first to analyze the architecture of complexity and to propose a preferential attachment mechanism to explain power law distributions. This is not strictly compsci but more philosophical. You might be interested in his later works which expands on the topic.
“The UNIX Time-Sharing System” [11 pages], by Dennis Ritchie & Ken Thompson, is one of the best-written papers ever. The elegance of thought and economy of description set a standard we should all aspire to.
Ritchie Thompson 1974 – The UNIX Time Sharing System
“Lampson & Sturgis” 1976 “Crash Recovery in a Distributed System” [28 pages], is a beautiful piece of writing and logic on stable storage and layered abstractions.
Often seen as an apology for having invented the FORTRAN language, Backus‘  Turing Award lecture [29 pages] was one of the most influential and now most-often cited papers advocating the functional programming paradigm. Backus coined the term “word-at-a-time programming” to capture the essence of imperative languages, showed how such languages were inextricably tied to the von Neumann machine, and gave a convincing argument why such languages were not going to meet the demands of modern software development. That this argument was being made by the person who is given the most credit for designing FORTRAN and who also had significant influence on the development of ALGOL led substantial weight to the functional thesis. The exposure given to Backus’ paper was one of the best things that could have happened to the field of functional programming, which at the time was certainly not considered mainstream.
Paul Baran‘s 1962, “On Distributed Communications Networks” [41 pages]: It is from this paper that the rumor was started that the Internet was created by the military to withstand nuclear war. This is totally false. Even though this Rand work was based on this premise, the ARPANET and the Internet stemmed from the MIT work of Licklider, Kleinrock and Roberts, and had no relation to Baran’s work.
Brian Foote and Joseph Yoder wrote an often quoted paper about the undocumented “Big Ball of Mud” [41 pages] architectural design pattern. A Big Ball of Mud is a haphazardly structured, sprawling, sloppy, duct-tape-and-baling-wire, spaghetti-code jungle. These systems show unmistakable signs of unregulated growth, and repeated, expedient repair. Information is shared promiscuously among distant elements of the system, often to the point where nearly all the important information becomes global or duplicated. The overall structure of the system may never have been well defined. If it was, it may have eroded beyond recognition. Programmers with a shred of architectural sensibility shun these quagmires. Only those who are unconcerned about architecture, and, perhaps, are comfortable with the inertia of the day-to-day chore of patching the holes in these failing dikes, are content to work on such systems.
“The Mythical Man-Month” [212 pages] from 1975 on software engineering and project management popularized Brook‘s Law: “Adding manpower to a late software project makes it later” and was the first book to advocate prototyping. It is as relevant today as it was 30 years ago.
The 1981 paper, by David Chaum, “Untraceable Electronic Mail, Return Addresses, and Digital Pseudonyms” [8 pages], laid the groundwork for the field of anonymous communications research. The ideas described are the technical roots of the Cypherpunk movement that began in the late 1980s. Chaum’s proposal allowed users to obtain digital currency from a bank and spend it in a manner that is untraceable by the bank or any other party.
In his “The Complexity of Theorem Proving Procedures” [7 pages], Cook formalized the notions of polynomial-time reduction (a.k.a. Cook reduction) and NP-completeness, and proved the existence of an NP-complete problem by showing that the Boolean satisfiability problem (usually known as SAT) is NP-complete. This theorem was proven independently by Leonid Levin in the Soviet Union, and has thus been given the name the Cook-Levin theorem. The paper also formulated the most famous problem in computer science, the P vs. NP problem. Informally, the “P vs. NP” question asks whether every optimization problem whose answers can be efficiently verified for correctness/optimality can be solved optimally with an efficient algorithm.
David Garlans 1994 paper, “An Introduction to Software Architecture” [42 pages], was one of the first to provide an introduction to the (back then) emerging field.
David Parnas 1971 paper, “Information Distribution Aspects of Design Methodology” [6 pages], was fundamental to modular software design and information hiding.
In their 2006 paper, “Anonymity Loves Company, Usability and the Network Effect” [12 pages], Dingledine and Mathewson focus on the network effects of usability on privacy and security. This paper has heavily influenced the design of the TOR onion router.
The 1968 paper, “GO TO Statement Considered Harmful” [4 pages], by Edsger Dijkstra‘s, criticized the excessive use of the GOTO statement in programming languages of the day and advocated structured programming instead.
The approach to “A Syntactic Approach to Type Soundness” [49 pages], presented in 1992 by Felleisen & Wright, introduced type soundness for Hindley/Milner-style polymorphic type systems. The keys to their approach are an adaptation of subject reduction theorems from combinatory logic to programming languages, and the use of rewriting techniques for the specification of the language semantics. The paper has heavily influenced programming language design in C#, Java and C++
“Harvest, Yield and Scalable Tolerant Systems” [5 pages], by Fox & Brewer, is one of a series of their papers which introduced the CAP theorem, also known as Brewer’s theorem.
“No Silver Bullet — Essence and Accidents of Software Engineering” [16 pages], is a widely discussed paper on software engineering written by Fred Brooks in 1986. Brooks argues that “there is no single development, in either technology or management technique, which by itself promises even one order of magnitude [tenfold] improvement within a decade in productivity, in reliability, in simplicity.” He also states that “we cannot expect ever to see two-fold gains every two years” in software development, like there is in hardware development (Moore’s law).
The 1982 paper, “Conservative Logic” [26 pages], is a comprehensive model of computation which explicitly reflects a number of fundamental principles of physics, such as the reversibility of the dynamical laws and the conservation of certain additive quantities (among which energy plays a distinguished role). Because it more closely mirrors physics than traditional models of computation, conservative logic is in a better position to provide indications concerning the realization of high-performance computing systems, i.e., of systems that make very efficient use of the “computing resources” actually offered by nature. In particular, conservative logic shows that it is ideally possible to build sequential circuits with zero internal power dissipation.
C. A. R. Hoare, in 1978, first introduced the idea of “Communicating Sequential Processes (CSP)” [260 pages], a formal language for describing patterns of interaction in concurrent systems. CSP is a member of the family of mathematical theories of concurrency known as process algebras, or process calculi, based on message passing via channels. CSP was highly influential in the design of the occam programming language, and also influenced the design of programming languages such as Limbo and Go.
Not a paper per-se but a very complete technical standard, the “IEEE Recommended Practice for Architectural Description of Software-Intensive Systems” [23 pages], describes the fundamental organization of a system, embodied in its components, their relationships to each other and the environment, and the principles governing its design and evolution.
The 1994 classic, “A Note on Distributed Computing” [14 pages], argues that objects that interact in a distributed system need to be dealt with in ways that are intrinsically different from objects that interact in a single address space. These differences are required because distributed systems require that the programmer be aware of latency, have a different model of memory access, and take into account issues of concurrency and partial failure. It looks at a number of distributed systems that have attempted to paper over the distinction between local and remote objects, and show that such systems fail to support basic requirements of robustness and reliability. These failures have been masked in the past by the small size of the distributed systems that have been built. In the enterprise-wide distributed systems foreseen in the near future, however, such a masking will be impossible. It concludes by discussing what is required of both systems-level and application-level programmers and designers if one is to take distribution seriously.
An old and respected paper from 1984, about compilers that teaches us a lot about network security architecture, “Reflections on Trusting Trust” [8 pages], by Ken Thompson. Thompson stipulated how he could insert a backdoor into the compiler so that even if your code is safe, after being compiled it will get back-doored. While his paper is about compilers, the concept is trust. How far can you trust anything? How far can what you trust, in turn, trust anything further down the line? If you write your own programs, then you can be reasonably sure they have no backdoor. Do you also write your own compiler? How about the operating system? The motherboard? The CPU? There’s no end to trust. No matter how paranoid you are, eventually you have to take a leap of faith.
Leslie Lamport is probably best-known for the open-source typesetting LaTeX macro package and book. Arguably his most important contributions are in the domain of distributed systems: “Lamport 78 Time Clocks and the ordering of Events in a Distributed System” [8 pages]. The paper suggests to think about distributed systems in terms of state machines — an idea that most people who read the paper seem not to notice. Almost any system can be described as a state machine. So an easy way to implement a system is to describe it as a state machine and use a general algorithm for implementing state machines.
The “Byzantine Generals Problem” [20 pages], is an agreement protocol that’s built around an imaginary General who makes a decision to attack or retreat, and who must communicate his decision to his lieutenants. Lamport framed the paper around a story problem after observing what he felt was an inordinate amount of attention received by Dijkstra’s Dining Philosopher problem. This problem is built around an imaginary General who makes a decision to attack or retreat, and must communicate the decision to his lieutenants. A given number of these actors are traitors (possibly including the General). Traitors cannot be relied upon to properly communicate orders; worse yet, they may actively alter messages in an attempt to subvert the process.
The 1972 paper, by Barbara Liskov, “A Design Methodology for Reliable Software Systems” [10 pages], argues entire system should be partitioned. No global state and each partition owns some part of the state. Partition exposes operations and only way of interacting with the state would be by calling operations on the modules.
Martin Fowler proposes in “Who needs Architects?” [3 pages], that Software is not limited by physics, like buildings are. It is limited by imagination, by design, by organization. In short, it is limited by properties of people, not by properties of the world. “We have met the enemy, and he is us.” As Fowler mentions in his article, once you have designed a building it is very difficult to change the basement, on the other hand if you have layered your software well and applied the relevant design patterns you can still make shifts to your foundations without impacting the entire software stack! A Must-Read not just for Software-Architects.
Merkle’s 1979, “Security, Authentication and Public Key Systems” [193 pages], is a must read for anyone working in the security field. Merkle co-invented the Merkle–Hellman knapsack cryptosystem, invented cryptographic hashing (now called the Merkle–Damgård construction based on a pair of articles published 10 years later that established the security of the scheme), and invented Merkle trees. While at Xerox PARC, Merkle designed the Khufu and Khafre block ciphers, and the Snefru hash function.
The 2006 paper “Out of the Tarpit” [66 pages], by Moseley and Marks, proposes the Functional Relational Programming (FRP) approach to address out-of-control complexity in software. You can witness many of the ideas below actually implemented (at least partly) in the Clojure language and some of its libraries. [additional review]
The term “Software Engineer” was first used in 1968 as a title for the world’s first conference on Software Engineering, sponsored and facilitated by NATO. The conference was attended by international experts on software who agreed on defining best practices for software grounded in the application of engineering. The result of the conference is a report [136 pages], that defines how software should be developed [i.e., software engineering foundations]. The original report is publicly available. (Naur and Randell, 1968)
Nikolaus Wirth’s 1971 paper, “Program Development by Stepwise Refinement” [13 pages], about the teaching of programming, is considered to be a classic text in software engineering. Niklaus Wirth is the inventor of the programming language Pascal. In this paper Wirth argues how to decompose a task in subtask, in a top-down fashion.
Raymond’s 1998 book, “The Cathedral and the Bazaar” [35 pages], is considered as the Bible of the Open Source definition. It attempts to explain what open source is and define its motives and mechanics.
Bertrand Meyer is generally credited as having originated the term “Open/Closed Principle” [14 pages]. In OOP, software entities (classes, modules, functions, etc.) should be open for extension, but closed for modification.
Satoshi Nakamoto makes the case for Bitcoin in his 2008, “Bitcoin: A Peer-to-Peer Electronic Cash System” [9 pages]. Satoshi Nakamoto, which is the pseudonym of an individual or group behind the creation of bitcoin, took many ideas from the Cypherpunk movement (see above David Chaum, “Untraceable Electronic Mail, Return Addresses, and Digital Pseudonyms“). Bitcoin solves the Byzantine Generals Problem described by Leslie Lamport by adding a random cost to sending messages, which slows down the rate of message passing to ensure only one camp will be able to broadcast at a time. The cost that it adds is called ‘proof of work’, and is based on calculating a random hash algorithm. Bitcoin promises the separation of banking and state, and is neither controlled by state, banks, corporations. It’s the first international currency and it’s controlled by mathematics. But the paper Satoshi published is more than just a description for e-currency. Money is just the first application. Bitcoin the blockchain is essentially a network layer similar to TCP/IP that allows you to do transactions between a sender and recipient but supports higher layers built on top of it. The scenario of John paying Sarah 1 bitcoin is done via a stack based protocol in Reverse Polish Notation (RPN). This makes the protocol a highly versatile framework for a variety of transactions that are completely unrelated to e-currency and built on top of the blockchain protocol. Already we see a host of other ideas that are coming forward such as trust-models, escrow or timelocks, and many more, leveraging the protocol.
In his 2002 paper “The Roots of Lisp” [13 pages], “Paul Graham” (who proposed the idea of Bayesian anti-spam filtering) provides a refreshing analysis of the McCarthy classic Recursive Functions of Symbolic Expressions and Their Computation by Machine (Part I).
Steele and Sussman 1978 paper, “The Art of the Interpreter” [75 pages], examines the effects of various language design decisions on the programming styles available to a user of the language, with particular emphasis on the ability to incrementally construct modular systems. This influential paper/essay starts with an description of the history of the Lisp programming language, then continues by describing a series of language features, and how they are motivated by the need to support modular programming. What made it a classic is the fact that each of these features are described through a small interpreter implementing it, so-called metacircular evaluators. The elegant manner this in done in is a wonderful demonstration of Steele and Sussman’s master-hackerdom — more than two decades later, the essay is still a very worthwhile read.
“Fundamental Concepts in Programming Languages” [39 pages], is an influential set of lecture notes written by Christopher Strachey for the International Summer School in Computer Programming at Copenhagen in August, 1967. It introduced much programming language terminology still in use today, including R-values, L-values, parametric polymorphism, and ad hoc polymorphism.
Turner‘s 2004 paper, “Total Functional Programming” [18 pages], drives the idea of functional programming is to make programming more closely related to mathematics. A program in a functional language such as Haskell or Miranda consists of equations which are both computation rules and a basis for simple algebraic reasoning about the functions and data structures they define. The existing model of functional programming, although elegant and powerful, is compromised to a greater extent than is commonly recognized by the presence of partial functions. We consider a simple discipline of total functional programming designed to exclude the possibility of non-termination. Among other things this requires a type distinction between data, which is finite, and codata, which is potentially infinite.
“End-to-End Arguments in System Design” [10 pages] by Saltzer, Reed and Clark states that functionality of a network system should be pushed as close to the endpoint, i.e. the application that uses it, instead of putting much functionality in the lower level of the system. If functions are placed at lower levels of the network, all applications that use the network use the functions redundantly, even if they don’t need them. While this paper describes with good arguments the basics of the network systems used today it also has its critics. It is interesting to read from the first source as to why and how these principles were invented and widely spread.
Locality is among the oldest systems principles in computer science. It was discovered in 1967 during efforts to make early virtual memory systems work well. It is a package of three ideas: (1) computational processes pass through a sequence of locality sets and reference only within them, (2) the locality sets can be inferred by applying a distance function to a program’s address trace observed during a backward window, and (3) memory management is optimal when it guarantees each program that its locality sets will be present in high-speed memory. In 1966 Peter J. Denning proposed the working set as a dynamic measure of memory demand and explained why it worked using the locality idea introduced by Les Belady of IBM. His working set paper “The Locality Principle” became a classic. It received an ACM Best paper award in 1968 and a SIGOPS Hall of Fame Award in 2005.
The next “paper” (thanks Mark Fiammante) is a book “Compilers: Principles, Techniques, and Tools, Second Edition”, is a computer science textbook by Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman about compiler construction. First published in 1986, it is widely regarded as the classic definitive compiler technology text. It is affectionately known as the Dragon Book to a generation of computer scientists as its cover depicts a knight and a dragon in battle, a metaphor for conquering complexity. This name can also refer to Aho and Ullman’s older Principles of Compiler Design.
No such list would be complete (thank you Krishnakumar S. :-)) without a mention of Edgar F. Codd. An Oxford-educated mathematician working at the IBM San Jose Research Lab, published a paper showing how information stored in large databases could be accessed without knowing how the information was structured or where it resided in the database. Dr. Codd, known as “Ted” to his colleagues, was honored as an IBM Fellow in 1976, and in 1981, the Association for Computing Machinery gave him the Turing Award for contributions of major importance to the field of computing.
Finally there is one last paper by Doug Zongker which you should not miss under any circumstance. It has been described as a “String Theory for CompSci” and convinces with elegant simplicity and hence by some referred to as a “Theory of Everything for developers”, bringing together all the different camps.
No list can ever be ultimate or complete but if you feel we forget your favorite paper, please let me know in the comments or drop me a line. Thanks!
Last updated: 11th May 2016
Valbonne Consulting provides Research & Consulting for emerging technologies in Internet/Web of Things (WoT/IoT/M2M) and Emerging-Tech. We specialise in decentralisation, security and privacy. We work across a variety of traditional industry verticals (Telecommunications, Automotive, Energy, ...). We support Open Source and technologies built on open standards.