Tentative schedule:
Week |
Topic |
Notes |
Instructor |
Week 1 | Introduction | Why not single machine, big-data challenges, datacenter structure, typical use cases and their requirements, course overview. | All |
Week 2 | Batch Processing | MapReduce: architecture, components, programming model. | Sahu |
Week 3 | Batch Processing (cont) | Storage side: HDFS, HBASE. | Sahu |
Week 4 | Distributed Systems Primer (notes) | Challenges and principles, failure modes, inherent tradeoffs. | Geambasu |
Week 5 | Communication and Synchronization Building Blocks (RPC notes, clocks notes, mutual exclusion example (slides by Dave Andersen)) | Remote procedure calls, clock synchronization, logical clocks -- all building blocks for distributed algorithms. | Geambasu |
Week 6 | Hard problems in Distributed Systems (consensus problem notes, 2PC notes, Paxos notes (we skipped 3PC)) | Consistency, consensus, known impossibility results, approaches to navigate the challenges. | Geambasu |
Week 7 | Google's Storage Stack | How core problems in distributed systems are solved in the real world. Design of Chubby, Bigtable, two fundamental components of a google cluster. High-level architecture of a Google cluster. | Geambasu |
Week 8 | Data Models and Cleaning |
Why the relational data model? Why schemas? The ins and outs.
Reading: |
Wu |
Week 9 | Cleaning and Integration | Readings | Wu |
Week 10 | Iterative Processing | Spark, RDD abstractions for in-memory computation; Spark Tachyon, a memory-centric distributed file system. | Sahu |
Week 11 | Iterative (continued) + Stream Processing | Sahu | |
Week 12 | Machine Learning Systems & Examples | Introduction to MLLib and/or other ML processing systems. | Industry guest. |
Week 13 | Classic Query Processing and Fast Query Processing |
Classic, analytic, and transaction oriented query execution. Readings |
Wu |
Week 14 | Potourri | Mixture of ideas: Graph analysis. Scalable visualization. Distributed transactions. | Wu |