Policies (not mechanisms) for using limited resources (processors, memory, disk, network) to schedule jobs efficiently


Scheduling on Single Processor

🥇 FIFO (First In First Out)

image.png

🩳 STF (Shortest Task First)

image.png

🔃 Round-Robin


But what happens in a distributed setup… like in the cloud…?

☁️ Hadoop Scheduling

Use case: multiple customers want to run their own map-reduce jobs, how do we be fair across all the tenants?

Hadoop’s YARN uses two variants:

🍵 Hadoop Capacity Scheduler - HCS

⚖️ Hadoop Fair Scheduler - HFS