Policies (not mechanisms) for using limited resources (processors, memory, disk, network) to schedule jobs efficiently
Scheduling on Single Processor
🥇 FIFO (First In First Out)
- Maintain queue, schedule tasks in order of arrival
- PRO
- simple to implement
- preferred for batch processing jobs (no need to preempt)
- CON
- average completion time is high, not optimal
- a really big job that arrives first gets scheduled, and tasks that arrive later but could have finished much quicker just wait to wait

🩳 STF (Shortest Task First)
- Maintain tasks in queue, in increasing order of runtime
- Special case of 🚨 priority scheduling where priority is based on time
- PRO
- most optimal for completion time, has the shortest average
- preferred for batch processing jobs (no need to preempt)
- CON
- for this to work, we need to know the completion time of task, which we cannot always know ahead of time
- longer task can be starved forever

🔃 Round-Robin
But what happens in a distributed setup… like in the cloud…?
☁️ Hadoop Scheduling
Use case: multiple customers want to run their own map-reduce jobs, how do we be fair across all the tenants?
Hadoop’s YARN uses two variants:
🍵 Hadoop Capacity Scheduler - HCS
- Each queue is guaranteed a portion of the cluster capacity, jobs within the same queue use FIFO (typically)
- Admin sets a soft-limit (given at least this capacity) and optional hard-limit (given at most this capacity)
- NO preemption
- 💡 preemption is actually not even possible in Hadoop because if you think about Map tasks, the output is not even written to HFDS (just written locally) so we cannot preempt because the work will not be saved anywhere!
- Queues can be hierarchical (child sub-queues) to share resources equally
⚖️ Hadoop Fair Scheduler - HFS
- Goal is to give all jobs equal share of resources
- Containers are created to be able to handle 1 task per container