Following the tweet above, I’ve decided to do a thread dump of my favorite computer science papers.

This is not a you should read these papers kind of post, it’s a curated list of great computer science papers that I’ve enjoyed reading and re-reading over the past years.

(I think you should read them as well!)

📃 The Design and Implementation of a Log-Structured File System

💡 You’ll learn about a technique called a log-structured file system that writes all modifications to disk sequentially, thereby speeding up both file writing and crash recovery.

The Design and Implementation of a Log-Structured File System

📃 The Ubiquitous B-Tree

💡 You’ll learn about a disk-based index structure called B-Tree and its different variations. The paper does quite a good job of explaining why they have been so successful over the years.

The Ubiquitous B-Tree

📃 The Log-Structured Merge-Tree

💡 You’ll continue to learn about low-cost indexing for a file experiencing a high rate of record inserts over an extended period. The paper also provides a nice comparison of LSM-tree and B-tree I/O costs.

📃 Kafka: a Distributed Messaging System for Log Processing

💡 You’ll learn about log processing, Kafka’s architecture, and design principles including producers, brokers, and consumers.

📃 ZooKeeper: Wait-free coordination for Internet-scale systems

💡 You’ll learn about the ZooKeeper wait-free coordination kernel and a lot of distributed systems concepts that are nicely described in the paper.

📃 A Certified Digital Signature

💡 You’ll learn about one-way functions, the Lamport-Diffie one-time signature, and a new “tree-signature” also known as Merkle tree.

📃 Time, Clocks and the Ordering of Events in a Distributed System

💡 Leslie Lamport’s most cited paper. You’ll learn about logical clocks, real-time synchronization, and concepts such as “total ordering” and “happened-before”.

📃 Harvest, Yield, and Scalable Tolerant Systems

💡 You’ll learn about strategies for improving a system’s overall availability while tolerating some kind of graceful degradation.

📃 The Byzantine Generals Problem

💡 You’ll learn about reliability in computer systems, whenever it has to cope with the failure of one or more of its components.

📃 Linearizability: A Correctness Condition for Concurrent Objects

💡 You’ll learn about a strong correctness condition for concurrent objects that guarantees a strict time ordering of read and write operations in a multi-threaded environment.

📃 Conflict-free Replicated Data Types

💡 You’ll learn about a data structure that makes the eventual consistency of a distributed object possible without coordination between replicas.

📃 Delta State Replicated Data Types

💡 You’ll learn about an optimization made to state-based CRDTs that ensure convergence by disseminating only recently applied changes, instead of the entire (possibly large) state.

📃 Making reliable distributed systems in the presence of software errors

💡 You’ll learn about Erlang, concurrent programming, message passing, fault-tolerance, and the concept of “let it crash”.

Looking for more papers?

These are my favorites.

I might be missing a few papers, for sure.

You can still find a lot of curated papers for you to read at @papers_we_love, @intensivedata, and @therealdatabass.