Sometimes people ask me which computer science papers they should read and I can't really answer that question, but I can list the papers I've enjoyed reading over the past years.
— Pedro Tavareฮป (@ordepdev) August 14, 2021
Following the tweet above, Iโve decided to do a thread dump of my favorite computer science papers.
This is not a you should read these papers kind of post, itโs a curated list of great computer science papers that Iโve enjoyed reading and re-reading over the past years.
(I think you should read them as well!)
๐ก Youโll learn about a technique called a log-structured file system that writes all modifications to disk sequentially, thereby speeding up both file writing and crash recovery.
๐ก Youโll learn about a disk-based index structure called B-Tree and its different variations. The paper does quite a good job of explaining why they have been so successful over the years.
๐ก Youโll continue to learn about low-cost indexing for a file experiencing a high rate of record inserts over an extended period. The paper also provides a nice comparison of LSM-tree and B-tree I/O costs.
๐ก Youโll learn about log processing, Kafkaโs architecture, and design principles including producers, brokers, and consumers.
๐ก Youโll learn about the ZooKeeper wait-free coordination kernel and a lot of distributed systems concepts that are nicely described in the paper.
๐ก Youโll learn about one-way functions, the Lamport-Diffie one-time signature, and a new โtree-signatureโ also known as Merkle tree.
๐ก Leslie Lamportโs most cited paper. Youโll learn about logical clocks, real-time synchronization, and concepts such as โtotal orderingโ and โhappened-beforeโ.
๐ก Youโll learn about strategies for improving a systemโs overall availability while tolerating some kind of graceful degradation.
๐ก Youโll learn about reliability in computer systems, whenever it has to cope with the failure of one or more of its components.
๐ก Youโll learn about a strong correctness condition for concurrent objects that guarantees a strict time ordering of read and write operations in a multi-threaded environment.
๐ก Youโll learn about a data structure that makes the eventual consistency of a distributed object possible without coordination between replicas.
๐ก Youโll learn about an optimization made to state-based CRDTs that ensure convergence by disseminating only recently applied changes, instead of the entire (possibly large) state.
๐ก Youโll learn about Erlang, concurrent programming, message passing, fault-tolerance, and the concept of โlet it crashโ.
These are my favorites.
I might be missing a few papers, for sure.
You can still find a lot of curated papers for you to read at @papers_we_love, @intensivedata, and @therealdatabass.