Last class we diuscussed about the core challenges of building distributed systems (incremental scalability is hard, at scale failures are inevitable, constant attacks, etc.). We've said that the core approach of building distributed systems to address these challenges is to construct layers upon layers of systems and services that raise the level of abstraction increasingly through well-defined APIs. Google's storage stack, as well as Spark and other systems, are perfect examples of that layered design for distributed systems. Today we'll start looking at one of these abstractions: RPC, which enables communication. To cooperate, machines need to first communicate. How should they do that? RPC is the predominant way of communicating in a DS. Interestingly enough, RPC -- the simplest possible DS abstraction -- reflects most of the challenges in distributed systems, showing you how fundamental those things are. REMOTE PROCEDURE CALLS (RPCs) // Notes taken largely from: // http://courses.cs.washington.edu/courses/csep552/13sp/lectures/1/rpc.txt Overall goal: to simplify the programmer's job when building distributed systems ------------- - many complexities when building a distributed program - argue that some are fundamental and must be dealt with by programmer, while others are mechanical and can be solved by good language, runtime support - pre-RPC, all of the gory details were exposed to the programmer - post-RPC, a remote interaction looks (nearly) identical to a local procedure call, with the system hiding (nearly) all of the differences How RPC works ------------- RPC ideally makes net communication look just like function call: Client: z = fn(x, y) Server: fn(x, y) { compute return z } RPC aims for this level of transparency Motivation: even novice programmers can use function call! RPC message diagram: Client Server request---> <---response RPC software structure: Client Client RPC Net RPC Server Handler App Stub lib lib Stub call marshall send request recv dispatch unmarshall fn(args) wait wait work... return unmarshall recv response send marshall return Procedural model, and places where distribution of RPC peeks through the LPC illusion ----------------- - a program is structured as a series of function calls - pass in arguments with a certain type; language ensures type-safety ==> vs. blob-of-bytes messages passed between processes ==> **marshalling** -- encoding of types/values into messages ==> **fragmentation and ordering** -- large values split into many messages ==> several types of argument passing disciplines; which can we implement using RPC? -- pass-by-value -- pass in copies. message marshalling. simple data types. -- pass-by-reference -- pass in pointer. multiple address spaces! - control flow of program transitions from caller to callee; caller is effectively blocked until callee returns. No need for separate "send" and "receive", no possiblity of function call failing (distinguish from function all returning an error value) ==> **coordination of sender/receiver, integration into program control flow** ==> performance difference between LPC and RPC ==> RPC can fail because of server failure (partial failure) or network failure (timeout) - callee is invoked exactly once ==> **dealing with message loss** ==> if not careful, in distributed system with timeouts and retransmissions, could see 0, 1, 2, ... invocations ==> strongest that is pragmatically possible: at-most-once semantics - no authentication or argument validation necessary between caller and callee ==> **might receive RPC from anybody on network** ==> care about who sent the RPC, and whether they are authorized to invoke the operation named by the RPC ==> care about validating the integrity of arguments ("rm *") - debugger can invoke stack trace to understand execution path of program ==> much harder to do in distributed system; distributed stack traces?? - destination (callee) is named by symbolic function name - compiler / runtime maps symbolic name to code block, arranges for transfer of program counter to the appropriate code block through "call" ==> **naming/addressing** -- with whom should I communicate? ==> could have multiple RPC servers exporting the same interface ==> caller needs to name and bind to one instance ==> name translated to address for transport/messaging layer ==> caller should detect if binding breaks ==> server fails, restarts on same address -- different instance -- why? Fundamental issues that RPC cannot hide -------------------- - single {process, address space} vs. multiple {processes, address spaces} - partial failure where caller/callee go down - pointers/references -- either impossible, or severe coherence issues - network failures - require timeouts vs. block-forever-until-reply (why?) - therefore, have to deal with premature timeout - performance differences (debatable) - 3-6 orders of magnitude difference between LPC and RPC performance; I claim is too big to ignore, has implications for program structure Engineering issues that are important to get right ------------------- - choosing between many vs. at-most-once vs. exactly once - why does this come up? network failures implying timeouts and retransmissions - implies clients will end up retransmitting requests, server will see duplicates - many: easiest to build - fine for idempotent requests - might be problematic for many kinds ("purchase an item") - Web provides this; what happens when you press "checkout" button and get timeout? - at-most-once: next easiest, has complications - server has to remember requests that it has already seen and processed, to detect duplicates to squelch -- clients/servers need to embed unique IDs (XIDs) in messages -- server maintains s[xid] (state), r[xid] (return value) if s[xid] == DONE: return r[xid] x = handler(args) s[xid] = DONE r[xid] = x return x - issue: what happens if server crashes/restarts? server will lose s,r tables -- possiblity: store on disk [slow, expensive] - issue: even if store on disk, what happens if server crashes just before or after call to handler()? --> after restart, server can't tell if handler() was called --> server must reply with "CRASH" when client resends request -- birrell paper -- server has "server nonce" that identifies restart generation - client obtains nonce during bind with server - client transmits nonce in every RPC request - if server nonce doesn't match server's state, return CRASH - issue: when is it safe for server to collect s[xid], r[xid]? --> once it sees an ACK from client that response was received --> birrell: piggyback ACK in next request to minimize # messages - exactly once: incredibly hard to build - server failure can stymie -- client blocks forever - need to implement handler() as a transaction that includes s[xid], r[xid] commit on server - optimizing performance - minimize # of messages (piggybacking) - minimize # of RTTs (piggybacking) ==> both on binding and on RPC - minimize amount of state stored on server (connectionless) - minimize # of context switches on server, client - optimizing for small messages vs. large data transport -- often a tradeoff between latency and throughput in transport design -- e.g., how often you ACK