Last class we diuscussed about the core challenges of building distributed
systems (incremental scalability is hard, at scale failures are inevitable,
constant attacks, etc.).  We've said that the core approach of building
distributed systems to address these challenges is to construct layers upon
layers of systems and services that raise the level of abstraction
increasingly through well-defined APIs.  Google's storage stack, as well as
Spark and other systems, are perfect examples of that layered design for
distributed systems.

Today we'll start looking at one of these abstractions: RPC, which enables
communication.  To cooperate, machines need to first communicate. How should
they do that?  RPC is the predominant way of communicating in a DS.
Interestingly enough, RPC -- the simplest possible DS abstraction -- reflects
most of the challenges in distributed systems, showing you how fundamental
those things are.


REMOTE PROCEDURE CALLS (RPCs)
// Notes taken largely from:
// http://courses.cs.washington.edu/courses/csep552/13sp/lectures/1/rpc.txt

Overall goal: to simplify the programmer's job when building distributed systems
-------------

 - many complexities when building a distributed program

   - argue that some are fundamental and must be dealt with by programmer,
     while others are mechanical and can be solved by good language, runtime
     support

   - pre-RPC, all of the gory details were exposed to the programmer

   - post-RPC, a remote interaction looks (nearly) identical to a local
     procedure call, with the system hiding (nearly) all of the differences


How RPC works
-------------

   RPC ideally makes net communication look just like function call:

     Client:
       z = fn(x, y)
     Server:
       fn(x, y) {
         compute
         return z
       }
     RPC aims for this level of transparency
     Motivation: even novice programmers can use function call!
   
   RPC message diagram:

     Client             Server
       request--->
          <---response
   
   RPC software structure:
   
   Client  Client     RPC       Net      RPC             Server      Handler
    App     Stub      lib                lib              Stub
   
   call    marshall   send     request   recv  dispatch  unmarshall  fn(args)
                      wait               wait                        work...
   return  unmarshall recv     response  send            marshall    return


Procedural model, and places where distribution of RPC peeks through the LPC illusion
-----------------

 - a program is structured as a series of function calls

      - pass in arguments with a certain type; language ensures type-safety
          ==> vs. blob-of-bytes messages passed between processes
          ==> **marshalling** -- encoding of types/values into messages
          ==> **fragmentation and ordering** -- large values split into many messages

          ==> several types of argument passing disciplines; which can we implement using RPC?
              -- pass-by-value -- pass in copies.  message marshalling.  simple data types.
              -- pass-by-reference -- pass in pointer.  multiple address spaces!

      - control flow of program transitions from caller to callee; caller
        is effectively blocked until callee returns.  No need for separate
        "send" and "receive", no possiblity of function call failing
        (distinguish from function all returning an error value)
          ==> **coordination of sender/receiver, integration into program control flow**

          ==> performance difference between LPC and RPC
          ==> RPC can fail because of server failure (partial failure) or network failure (timeout)

      - callee is invoked exactly once
          ==> **dealing with message loss**

          ==> if not careful, in distributed system with timeouts and retransmissions,
              could see 0, 1, 2, ... invocations
          ==> strongest that is pragmatically possible: at-most-once semantics

      - no authentication or argument validation necessary between caller and callee
          ==> **might receive RPC from anybody on network**

          ==> care about who sent the RPC, and whether they are authorized to invoke
              the operation named by the RPC
          ==> care about validating the integrity of arguments ("rm *")

      - debugger can invoke stack trace to understand execution path of program
          ==> much harder to do in distributed system; distributed stack traces??


 - destination (callee) is named by symbolic function name

      - compiler / runtime maps symbolic name to code block, arranges for
        transfer of program counter to the appropriate code block through "call"
          ==> **naming/addressing** -- with whom should I communicate?

          ==> could have multiple RPC servers exporting the same interface
          ==> caller needs to name and bind to one instance
              ==> name translated to address for transport/messaging layer
          ==> caller should detect if binding breaks
              ==> server fails, restarts on same address -- different instance -- why?

Fundamental issues that RPC cannot hide
--------------------

 - single {process, address space} vs. multiple {processes, address spaces}
    - partial failure where caller/callee go down
    - pointers/references -- either impossible, or severe coherence issues

 - network failures
    - require timeouts vs. block-forever-until-reply  (why?)
    - therefore, have to deal with premature timeout

 - performance differences (debatable)
    - 3-6 orders of magnitude difference between LPC and RPC performance;
      I claim is too big to ignore, has implications for program structure


Engineering issues that are important to get right
-------------------

 - choosing between many vs. at-most-once vs. exactly once
    - why does this come up?  network failures implying timeouts and retransmissions
       - implies clients will end up retransmitting requests, server will see duplicates

    - many: easiest to build
       - fine for idempotent requests
       - might be problematic for many kinds ("purchase an item")
       - Web provides this; what happens when you press "checkout" button and get timeout?
   
    - at-most-once: next easiest, has complications
       - server has to remember requests that it has already seen and processed,
         to detect duplicates to squelch
          -- clients/servers need to embed unique IDs (XIDs) in messages
   
         -- server maintains s[xid] (state), r[xid] (return value)
        
          if s[xid] == DONE:
            return r[xid]
          x = handler(args)
          s[xid] = DONE
          r[xid] = x
          return x
   
       - issue: what happens if server crashes/restarts?  server will lose s,r tables
           -- possiblity: store on disk  [slow, expensive]
   
            - issue: even if store on disk, what happens if server crashes just before or after
                     call to handler()?
               --> after restart, server can't tell if handler() was called
               --> server must reply with "CRASH" when client resends request

           -- birrell paper -- server has "server nonce" that identifies restart generation
              - client obtains nonce during bind with server
              - client transmits nonce in every RPC request
              - if server nonce doesn't match server's state, return CRASH
   
       - issue: when is it safe for server to collect s[xid], r[xid]?
          --> once it sees an ACK from client that response was received
          --> birrell: piggyback ACK in next request to minimize # messages
   
    - exactly once: incredibly hard to build
       - server failure can stymie -- client blocks forever
       - need to implement handler() as a transaction that includes s[xid], r[xid] commit on server

 - optimizing performance
     - minimize # of messages  (piggybacking)
     - minimize # of RTTs  (piggybacking)
         ==> both on binding and on RPC
     - minimize amount of state stored on server  (connectionless)
     - minimize # of context switches on server, client

     - optimizing for small messages vs. large data transport
        -- often a tradeoff between latency and throughput in transport design
           -- e.g., how often you ACK