Indy: Disaggregated Datacenters

Underlying network in datacenter has become so fast (RDMA, CXL, etc.) that the latency of accessing remote server (from Server A → Server B) is the same as latency of accessing local server
Idea is to split the servers and connect them using high performance network
- CPU-heavy rack
- Memory-heavy rack
- Persistent storage rack
- etc.
What should OS do? What should orchestrating services like Kubernetes do?
Pros of disaggregated datacenters
- Easy to scale out/in, easier to scale out individual things, like if you’re good on memory but need more compute power, just expand your CPU servers
Latency increases with memory size! When you access a memory address, translation takes a long time as it needs to go through a set of registers that increases in number as memory itself scales

Erasure Coding

Comes from communications and has been adopted by distributed systems
Alternative to replication in fault-tolerance is to store parity blocks, XOR them to recover data
Now erasure encoding builds on this
- packet split into D packets in such a way that
- any K subset of the D packets is sufficient to decode and retrieve the original packet
- ex: if packet split into D = 8 packets, we only need K = 5 servers to respond to reconstruct the original packet, so only the first 5 responses are sufficient

SWARM and ACESO (read in class) are both specialized for KV-stores, but consensus based replication is still an open problem for transactions? Indy says we can expect papers in this area soon!