The following topics will be presented over the course of the semester. Each topic will be covered in (roughly) one week of lectures. Lecture notes are linked as they become available.

  1. Course introduction

  2. Distributed systems primer
    • challenges and goals of distributed systems
    • example architectures
  3. Communication models
    • remote procedure calls (RPC)
    • RPC libraries
    • failure models
    • semantics
  4. Time and coordination
    • challenges
    • physical and logical clocks
    • distributed mutual exclusion
  5. Agreement in distributed systems
    • the atomic commitment problem
    • the consensus problem
    • use cases for each
  6. The transaction abstraction
    • ACID semantics
    • concurrency control mechanisms
    • recovery mechanisms
  7. Atomic commitment protocols
    • 2-phase-commit
    • blocking nature and how to mend that
  8. Consensus protocols
    • Paxos overview, key ideas, basic algorithm
    • examples of normal operation and operation under failures
    • liveness failure mode
    • FLP impossibility result
  9. Replication architectures
    • fault-tolerant architectures: primary/secondary
    • the design of Google’s Chubby fault tolerant lock service
    • the 2PC+Paxos approach to both scalability and fault tolerance
  10. Case study: Google’s Spanner
    • design of TrueTime
    • design of Spanner and its linearizable, distributed transactions
  11. Consistency models
    • sequential, causal, and eventual consistency
    • mechanisms to achieve each
    • tradeoffs
  12. Distributed Computation
    • MapReduce design
    • TensorFlow design
  13. Distributed systems security primer
    • security challenges and opportunities in DS
    • authentication protocols: Needham-Schroeder, Kerberos
    • byzantine fault tolerance (a few words)

Assignments/Go Lecture

Supplemental code for lecture