šŸ“ā€ā˜ ļøĀ Failure Detection

Possible through 2 subprotocols

Desired properties

Impossible to have both completeness and accuracy in any lossy network, never know if it was a server crash vs. slow network vs. etc… [Chandra and Toueg]

MTTF (mean time to failure) is proportional to scaling datacenter

Frequency increases linearly to scaling datacenter


šŸ«€Ā Heartbeating

Heartbeating is a mechanism for failure detection where nodes increment their own heartbeat and send them to neighbors → ping means ā€œI am alive, here is my heartbeatā€