Lecture 21: Replication Control

Concurrency has to do with when multiple clients execute operations on a server at the same time. Replication has to do with handle operations on an object that is stored at multiple servers.

🥚 Replication 101

Why replicate? Fault tolerance, load balancing, higher availability.

Magic of replication:

With no replicas, probability that single copy is up = $(1- f)$
With $k$ replicas, probability that single one of those copies is up = $(1-f^k)$, which is obviously much higher

Table shows that more replicas = more availability, but over-replication is bad because

challenging to maintain transparency, i.e. the client cannot tell any difference when things change on the server-side
challenging to stay consistent, i.e. all clients should see only one version of object

Num. replicas is decided by striking a balance between increased availability vs. increased overhead to maintain all these replicas

😋 Replication Flavors

To maintain consistency, we have two methods of forward updates from front-end to group of servers that share replicas.

Both use the concept of an RSM (replicated state machine), which has a fundamental principal that all writes written to an object across its replicas should be received in the same order from all clients.

💤 Passive Replication

Only primary replica (aka leader) is informed directly
Num. ACKs the leader waits for depends on consistency level

🏃🏽‍♀️ Active Replication

All replicas are informed at once. Red arrow is multicast within group.
Total, FIFO-Total, or Causal-Total ordering is used to maintain order of writes across replicas
Along with total ordering, use virtual synchrony to handle failures → all replicas will see all failures/leaves and all multicasts will be in same order

🏦 Transactions & Distributed Servers

Correctness in replication means to have one-copy serializability, i.e. a concept that mandates the result to be the same regardless of whether it’s fetched from a distributed version with replicas or a single machine. So how do we ensure that all servers make the same decision to commit or abort the transaction? This is a problem of consensus.

👥 Consensus

This is called the Atomic Commit problem — Paxos is a solution. But this is too complicated, is there a cheaper solution?

1️⃣ One-Phase Commit

Coordinator server passes along operations AND informs all servers of the final COMMITor ABORT decision

Downsides? 👎🏽 👎🏽 👎🏽

Actual servers have no say in COMMITor ABORT decision, only coordinate does
If data was corrupted or a server crashes, but the global decision was to COMMIT then the partial commit at that server will violate correctness