Sometimes people ask me which computer science papers they should read and I can't really answer that question, but I can list the papers I've enjoyed reading over the past years.
— Pedro Tavareλ (@ordepdev) August 14, 2021
Following the tweet above, I’ve decided to do a thread dump of my favorite computer science papers.
This is not a you should read these papers kind of post, it’s a curated list of great computer science papers that I’ve enjoyed reading and re-reading over the past years.
(I think you should read them as well!)
📃 The Design and Implementation of a Log-Structured File System
💡 You’ll learn about a technique called a log-structured file system that writes all modifications to disk sequentially, thereby speeding up both file writing and crash recovery.
📃 The Ubiquitous B-Tree
💡 You’ll learn about a disk-based index structure called B-Tree and its different variations. The paper does quite a good job of explaining why they have been so successful over the years.
📃 The Log-Structured Merge-Tree
💡 You’ll continue to learn about low-cost indexing for a file experiencing a high rate of record inserts over an extended period. The paper also provides a nice comparison of LSM-tree and B-tree I/O costs.
📃 Kafka: a Distributed Messaging System for Log Processing
💡 You’ll learn about log processing, Kafka’s architecture, and design principles including producers, brokers, and consumers.
📃 ZooKeeper: Wait-free coordination for Internet-scale systems
💡 You’ll learn about the ZooKeeper wait-free coordination kernel and a lot of distributed systems concepts that are nicely described in the paper.
📃 A Certified Digital Signature
💡 You’ll learn about one-way functions, the Lamport-Diffie one-time signature, and a new “tree-signature” also known as Merkle tree.
📃 Time, Clocks and the Ordering of Events in a Distributed System
💡 Leslie Lamport’s most cited paper. You’ll learn about logical clocks, real-time synchronization, and concepts such as “total ordering” and “happened-before”.
📃 Harvest, Yield, and Scalable Tolerant Systems
💡 You’ll learn about strategies for improving a system’s overall availability while tolerating some kind of graceful degradation.
📃 The Byzantine Generals Problem
💡 You’ll learn about reliability in computer systems, whenever it has to cope with the failure of one or more of its components.
📃 Linearizability: A Correctness Condition for Concurrent Objects
💡 You’ll learn about a strong correctness condition for concurrent objects that guarantees a strict time ordering of read and write operations in a multi-threaded environment.
📃 Conflict-free Replicated Data Types
💡 You’ll learn about a data structure that makes the eventual consistency of a distributed object possible without coordination between replicas.
📃 Delta State Replicated Data Types
💡 You’ll learn about an optimization made to state-based CRDTs that ensure convergence by disseminating only recently applied changes, instead of the entire (possibly large) state.
📃 Making reliable distributed systems in the presence of software errors
💡 You’ll learn about Erlang, concurrent programming, message passing, fault-tolerance, and the concept of “let it crash”.
Looking for more papers?
These are my favorites.
I might be missing a few papers, for sure.
You can still find a lot of curated papers for you to read at @papers_we_love, @intensivedata, and @therealdatabass.