Lectures

Tentative schedule:

Week	Topic	Notes	Instructor
Week 1	Introduction	Why not single machine, big-data challenges, datacenter structure, typical use cases and their requirements, course overview.	All
Week 2	Batch Processing	MapReduce: architecture, components, programming model.	Sahu
Week 3	Batch Processing (cont)	Storage side: HDFS, HBASE.	Sahu
Week 4	Distributed Systems Primer (notes)	Challenges and principles, failure modes, inherent tradeoffs.	Geambasu
Week 5	Communication and Synchronization Building Blocks (RPC notes, clocks notes, mutual exclusion example (slides by Dave Andersen))	Remote procedure calls, clock synchronization, logical clocks -- all building blocks for distributed algorithms.	Geambasu
Week 6	Hard problems in Distributed Systems (consensus problem notes, 2PC notes, Paxos notes (we skipped 3PC))	Consistency, consensus, known impossibility results, approaches to navigate the challenges.	Geambasu
Week 7	Google's Storage Stack	How core problems in distributed systems are solved in the real world. Design of Chubby, Bigtable, two fundamental components of a google cluster. High-level architecture of a Google cluster.	Geambasu
Week 8	Data Models and Cleaning	Why the relational data model? Why schemas? The ins and outs. Reading: What goes around comes around Unified Logging@Twitter	Wu
Week 9	Cleaning and Integration	Readings Truth finding on the deep web Data Wrangler	Wu
Week 10	Iterative Processing	Spark, RDD abstractions for in-memory computation; Spark Tachyon, a memory-centric distributed file system.	Sahu
Week 11	Iterative (continued) + Stream Processing		Sahu
Week 12	Machine Learning Systems & Examples	Introduction to MLLib and/or other ML processing systems.	Industry guest.
Week 13	Classic Query Processing and Fast Query Processing	Classic, analytic, and transaction oriented query execution. Readings C-Store: A Column-oriented DBMS Column-Stores vs. Row-Stores OLTP Through the Looking Glass, and What We Found There	Wu
Week 14	Potourri	Mixture of ideas: Graph analysis. Scalable visualization. Distributed transactions.	Wu

Computer Systems for Big Data

Lectures