Process Scheduling

We maintain a list of processes to run, but since our CPU can only run 1 thing at a time, the job of the scheduler is to decide which of these waiting processes should run next

Concurrent Execution

Concurrent processes are logical concepts to cover multitasked processes (system does multiple things). Regardless of how many cores/ cpus we have, there will be multiple jobs run on a single CPU.

It could be a virtual parallelism (a single cpu doing many things, we split cpu time among different jobs) or physical parallelism (multiple cpus doing multiple things concurrently, actually taking place at the same time)

Here, we have process a and b. Over time, a executes for a certain amount of time, followed by b, then a...

timeslicing

we are switching between a and b -- this is called timeslicing. 1st timeslice a runs, 2nd b runs, 3rd both a and b runs etc. The reason the CPU can run multiple things although it can only do 1 thing at a time is because we take time and split it up then distribute to a and b. Doing this fast gives an illusion that a and b happens simultaneously.

Interleaved Execution (context switching)

That was just a simplified idea of what happens. One of the very important things is to perform context switching.

When a runs, being made of machine instructions it makes use of registers (same for b). Since there is only 1 set of regiSters, after a runs b will overwrite the register values. When we hand control back to a, the register value would be incorrect.

To solve this problem, the OS is invoked at the end of each timeslice and the OS saves register contents for a then restores register contents for b before b executes. When b ends, OS saves register contents for b, restore content for a and hand control over to a. this is context switching.

contextSwtch

Apart from register content, there are other things to be saved as well, such as frame pointer, stack pointer, status register (tells outcome of the last instruction execution). All these are saved as part of the hardware context.

If a process is running, since CPU is executing process it cannot execute OS. So how does the OS start up to do these? There are 2 situations here.

At expiry of time segment, we start up the OS. The timer which triggers regular interrupts let us know that time segment expired.

If the interrupt service routine is 20ms, when timer reaches 0 (time quantum for process runs out), it invokes OS scheduler which chooses next process to run and perform context switch. Service routine for timer auto trigers scheduler for OS.

A process does not use full time quantum but is blocked

a3 doesn't use its full time quantum (maybe it tries to read keyboard, needs to wait for someone to press a key) so it cannot do anything. When a3 calls scanf, scanf calls an OS routine called read. When a3 executes scanf, it executes the OS code, which when executed, sets up read for a3 and moves a3 into a block state, goes back to ready queue, find b, save context for a and restore context for b and hands control to b.

So how os can switch between process though OS cannot run when proces runs is that we have regular timer interrupts that when the time quantum for process expires, will auto invoke process scheuler to choose next process and do context switch.

Else, when the process itself makes a OS call that causes the process to be blocked, then os itself will hand control to another process.

Multitasking OS

We can have a single CPU with timesliced execution of tasks, or multiple CPUs (multiprocessor) with timeslicing of n CPUs.

singlemulti

singleCPU

multiCPU

Multicore

Multicore is when we have a single cpu chip which within it has multiple mini CPUs. Each is a core and within it has individual function units like integer pipeline etc. and can have its own l1 cache etc. They share a common logic like accessing external bus, shared l2/l3 cache. Essentially its like a mini CPU that shares resources with the rest of the cores.

Scheduling in OS

If we have multiple processes ready to run and only 1 cpu, how to decide which to run next? (this is similar to thread-level scheduling) . This is known as the scheduling problem. First, let's look at some terminologies.

Scheduler: Part of the OS that makes scheduling decisions

Scheduling algorithm: used by scheduler to make these decisions

Here we have a series of process to run. The scheduler looks at the processes, select and allocate CPU to run.

scheduling

Every process has a different CPU requirement (of CPU time), which is the process behaviour and there are multiple ways to allocate the CPUs, which is influenced by the process environment (environment operating in), known as scheduling algorithm. We have a few ways to evaluate if a scheduling algorithm is good or bad.

Process Behaviour

A typical process goes through a couple of phases such as:

CPU activity: Where a lot of computation is done e.g. using spreadsheet. Compute-bound process spends majority of time here.

io activity: Requesting and receiving service from I/O devices e.g. microsoft word wait for keypress. IO-bound process spends majority of time here.

In a game, computation for physics engine is usually done on the CPU while computation for occlusion (wings behind an object) is done on GPU (basically, rendering graphics). But, transferring data from CPU to GPU takes time. Since GPU is part of IO system, it does not consume part of the CPU time.

Processing Environment

There are 2 main types of processing environment, batch processing and interactive (multiprogramming).

Batch processing consists of no interaction required, and no need to be responsive. We are e.g. simply compiling records to be updated to database, load it up and it runs through record and update database.

For interactive, there are often active users that are interacting w multiple things on the system. It is important that the scheduling time should be responsive so you don't feel lag.

There is also real time processing where we have deadlines to meet, and involves periodic proces that run at fixed periods, e.g. gps updates every 10ms, process to update gets invoked. More critically, when you press brakes, you won't want it to take a long time to react!

Criteria for Scheduling Algorithms

There are many criteria to evaluate scheduling algorithms, some that apply for all processing environment include: - Fairness - All process should get a fair share of cpu time - In single user systems per process basis or on shared system (linux, unix) per user basis - There should be no starvation (1 process takes all CPU time, the other don't get to run) - Balance - All parts of computing system should be utilized - Between cpu. IO and resources, usage of resources should be about the same - To ensure computing resources to be as eficiently used as possible

But when do we perform scheduling? There are 2 broad types of schedulers:

Non-preemptive(cooperative)
- Process remains running (stay scheduled) until it perform a blocking call or gives up CPU voluntarily
- OS can stop a currently running process to run another process
Preemptive
- Process given fixed time quota to run. When time quota ends, another process is scheduled.
- Possible to block or give up early
- Process either make a blocking call or volunatrily give up CPU
- Depends on the timer configured to interrupt every x ms. once ISR decrement to 0, it invokes OS scheduler who will take a new process.

Windows

In windows 3.1 and before, if you make a system call, even if it dosen't block, the OS takes advantage to hand control to another process (schedules a new process). (since system call invokes OS)

Scheduling Process

Scheduler is triggered (through timer ISR/syscall etc.). Control is handed to OS which trigers the scheduler.
If context switch is needed, context of currently running process is saved in PCB
- memory context and OS context included. Note we are saving info about the memory not the memory itself
Scheduler invoked to pick a process from ready process, restore context and handover control for process to run.

Batch Processing

You take a job, run on a computer and don't interact at all, like training a machine learning model. You compile all data to train and run algorithm which looks at all data, adjust the parameteres and repeat. Everything is done in 1 shot w/o user interaction.

The key thing here is that there is no user interaction. The most predominant way of doing something on this system is doing something non-preemptive.

Scheduling algorithms are generally easier to understand and implement and we can make variants/improvements to these algorithms for other types of systems. Some we will look at include First-Come First Served (FCFS), Shortest Job First (SJF) and a preemptive version of SJF, Shortest Remaining Time Next (SRT).

Criteria

To evaluate how well a scheduling algorithm works, we can have a few evaluation criteria.

Turnaround time
- Total time taken (i.e. finish - start time)
- Not the same as running time. e.g. if you have job a, b, c OF 5, 2 and 1 cpu unit time. If you run FCFS, a run first and use 5 time unit, etc, total time taken for c is 5 + 2 + 1 and not 1.
- Related to waiting itme (includes time wait for job to run)
Throughput
- Number of tasks finished per unit time
- If each job 0.1s to run and we have sufficient number of jobs to run, throughput is 10 jobs/s.
CPU utilization
- % of time CPU is working on a task
- if program wait for i/o, it cannot be running on CPU. CPU is doing nothing so utilization is very low.
- When process waiting for I/O, it needs the data to continue. So it takes time to transfer data and process cannot carry on.
- Idea of cpu utilization is how we can schedule processes so we can keep it as high as possible

FCFS

General idea is that a job there first will be served first. We are guaranteed no starvation as long as process is bounded (none process execute while(1)/ infinite loop causing other process to not have chance to run unless there is preemption) so every process has chance to run.

There is no starvation as the number of tasks in front of x in FIFO is always decreasing, so x will eventually get to run.

illustration

A runsB runsC runs

fcfs1

A runs for 8 TU and is taken off the queue

fcfs2

B runs for 5 TU and is taken off the queue

fcfs3

Finally, C runs for 3 TU

Average Waiting Time

Assuming A B and C are ready to run at time 0, A wait time 0, B is 8 and C is 13. Thus average is (0+8+13)/3 = 7 TU

Shortcomings

There are some shortcomings with FCFS. Firstly, a simple reordering of jobs can reduce the average waiting time.

There is also the convoy effect where the fist task A is CPU-Bound(it consumes a lot of CPU time) and runs all the way to completion. We also have a number of IO-bound tasks X. While waiting for A, the IO processes, although ready is idle

If A uses IO now, CPU is free and B will run. but cause B fast, use up CPU quickyl and now B is waiting for IO. Next C will run, finish and wait for IO. So initially we had a phase where IO was idle for a long time and now CPU idle for a long time

Shortest Job First

To help solve the problem of reducing average waiting time in FCFS, we have SFJ. The general idea is assuming all the jobs are available and ready to run, and we know how long each job is, we can sort the jobs by order of how long it takes (i.e. select task with smallest total CPU time).

Of course, this would require us to know how much CPU time it takes in advance, which is not easy to do. Another problem is that a computer generally would not know how many transactions there are until it goes through all transactions. However, given a fixed set of tasks, we can minimise average waiting time.

illustration

firstsecondthird

sfj1

Since C is shortest, C runs first. At this point, C waiting time is 0, B and A are 3.

sfj2

Next, B with 5 TU will run. A's waiting time is 8 TU.

sfj3

A with longest TU runs last.

Average Waiting Time

Now, our average waiting time is (0+8+3)/3 which is < 7, so it has decreased!

Predicting CPU Time

How do we know how much CPU time a job is going to take?

One of the ways is that we can state in advance how many transaction/things there are to do, but may not always possible.

Another way is that we can hope its a job done frequently, so we can estimate based on past experience. This means we guess the future CPU time by previous CPU-bound phases. A common apporach is to take the exponential average.

\[Predicted_{n + 1} = \alpha Actual_{n} + (1 - \alpha ) Predicted_{n}\]

\(Actual_n\) : Most recent CPU time consumed
\(Predicted_n\): Past history of CPU time consumed
\(\alpha\): Weight placed on recent event or past history
\(Predicted_{n + 1}\): Latest prediction, 1 = always based on previous experience.

Of course, the jobs may not all take the same amount of time. The actual time depends on the situation. Although it would not be 100% accurate but can be used make an estimate. As long as alha is between 0 to 1, prediction takes into account not just immediate past experience but also those before that.

Shortest Remaining Time

Now, we move on to SRT. Let's say not all jobs arrive at the same time. If at TU 1 a job with shorter TU arrives, we preempt the current process if remaining time is < TU of new process.

illustration

1 jobShorter job arrive3rd job entershortest finishedlast

srt1

At TU 0, we only have job A which will run. Job C with a shorter TU of 3 arrives at TU 1.

srt2

After A runs for 1 TU and has 7 TU left, then C arrives next with 3 TU. We preempt A since 3 < 7, so C gets to run.

srt3

Now, B enters at TU 2 with 5 TU. Since remaining time for B > c it won't preempt C and C continues to run.

srt4

Once C finishes, since B with 5 TU < A with 7 TU, B will run before A finishes.

srt5

Once B finishes, A with longest remaining time gets to resume

The idea here is similar to sjf just that this is preemptive, and if a shortest job comes later, it gets to run. For sfj, if shorter job comes later, it must still wait. So rather than looking at overall job length, we see the shortest remining time to select job.

A new job can preempt currently running jobs, providing good service for short jobs even when it arrives late.

How can identify the job: our computer kind of operates as part of a workflow.

Interactive Systems

We have a few criteria when scheduling for interactive systems.

Repsonse time
- Time between a request and response by system
- e.g. time between pressing a key and the letter to appear on the screen
Predictability
- Variation in response time, lesser variation = more predictable
- e.g. fast response 5% of the time and slow response randpmly 20% of the time is bad

Preemptive scheduling algorthims are used to ensure good response time and to ensure its predictable. So the scheduler needs to run periodically.

Periodic Scheduler

We already know that generally we are assuming we have a single cpu that can complete 1 instruction at a time but we are running multiple things. And if process is running, our OS is not running. So who does the job of switching form 1 process to another?

The OS actually does it, but if it isn't running how? This is done with the timer interrupt, which is configured to trigger a interrupt at intervals (e.g. 1ms based on hardware clock). It wont matter what you are running, control is handed over to the service routine for the timer interrupt.

The process will be interrupted, the service routine then hands control to OS which does what it needs to do then from ISR hand control back to a process (could be different). The os could have preempted another process to run instead.

Interrupt service routine code can only be set by OS, which ensurs timer interrupt cannot be intercepted by any other program. User processes cannot change this code generally.

Interval of Timer Interupt can also be thought of as inter-tick-interval (if timer interrupts are ticks). The PS scheduler is triggered every timer interrupt (typically 1 - 10ms). In linux, the 1ms interval is a jiffy.

The timer is configured to trigger an interrupt every ms. Every process is given a fixed amount of jiffies to run (known as time quantum, the execution duration given to a process)

The duration can be constant, but is usually variable (if a process has run for a long time, it has consumed a lof of RAM and CPU time. Assuming CPU has a lot of things to do, decreasing time quantum gives a more fair distribution of CPU time to other processes)

So time quantum gets slowly decremented to ensure a fair distribution of CPU time among proccesses. This is another important critera for schedulers -- fairness.

The time quantum has to be a multiple of ITI, and can have a large range of values (5 - 100ms)

ITI vs Time Quantum

After process A runs for 10ms, it has timer interrupt, which hands control over to timer ISR. At this point, timer ISR invokes OS scheduler. since time quantum is 20ms, OS scheduler updates time quantum from 20ms to 10ms, handa control back to timer ISR and back to process A

iti

After another 10ms, interrupt is triggered again, timer ISR is triggered again and scheuler triggered again. Time quantum decreemnts to 0 and contorl handed over to ISR. now, instead to hadning control to process A, control is handed to process b

iti

A runsB runscontinue

iti3

A's time quantum gets decremented 10s after the first timer interrupt.

iti4

A used up its time quantum, B runs and gets its time quantum decremented 10s after 1st interrupt.

iti5

This continues for rest of the jobs

When time quantum for a process expires, OS has to save context for that proces, find a new job to run, restore context for that new job before handing control back to ISR and back to new process.

The scheduling algorithms we will look at are round robin, priority based, multi-level feedback queue (MLFQ) and lottery scheduling.

Round Robin

Here, tasks are stored in a FIFO queue. This is similar to FCFS, but we pick the first task in queue, run until the fixed time slice (time quantum) has elapsed, task gives up CPU voluntarily or it blocks.

The task is then moved to the back of the queue to wait for another turn. The blocked task is moved to the another queue to wait for its request (block queue) since it may not be ready

When does a task become unblocked? When IO is received. But how does the task (or ratehr OS) know its data is ready: interrupts -- e.g. run ms word wait to press key, now process in block queue. When you press key, it triggers interrupt whcih triggers keyborads ISR which reads the char, put it somewhere and notify OS. OS sees who has been waiting for key press and mvoe process back into waiting queue.

illustration

only AB and CNlock

A runs for 2 TU. Since it is the only job, it continues for the next time quantum and runs for another 2 TU.

rr2

B and C then take turns to run. A is pushed to the back of the queue.

B and C both run for 2 TU each.

rr3

A runs for only 1 TU before B and C run. This could be caused by A blocking.

TU can vary due to blocks/sleep -- when processes sleep, control is handed over to another process, so it may not use up the entire time quantum.

This is essentially a preemptive version of FCFS. We have a response time guarantee, where given n tasks and qunantum q, the time before a task gets CPU is bounded by \((n - 1)q\). A timer interrupt is needed for the scheduler to check whether the time quantum has expired. The choice of time quantum is important:

Big quantum
- CPU utilization during quantum is essentially 100% (assuming no I/O) but poorer response to processes since need wait
- Better CPU utilization, longer waiting time (waiting time proportional to quantum)
Small quantum
- Overhead of context dwitching. Shorter intervals, more frequent context switching so more overhead involved
- Bigger overhead (wores CPU utilization), shorter waiting time

Priority scheduling

This is a very important class of scheduling algorithms, and most OS in the world will have some sort of priority scheduling. Process in foreground tend to have higher priority than background (you are using it) -- windows and mac os priority is increase user satisfaction.

Background processes on the other hand we don't normally care, like server processes aka daemons.

Secure shell

for mac os and linux you can remotely access your computer with secure shell and OS need to be able to receive connection using sshd (ssh daemon)

The general idea for priority scheduling is some processes are more important than others, we cannot treat all processes as equal. We assign a priority value to all tasks and when running a process, we will select a task with the highest priority value. As usual, there are 2 variants of this.

Preemptive
- Higher priority can always preempt lower once it is ready.
- If B is blocked and A is running, when B (with higher priority) is ready, OS will immediately stop A from running and run B
Non-preemptive
- High priority process that comes in late must wait for current process to finish (or if it blocks for I/O)

illustration

TU 0TU 1TU 4TU 11

ps1

A is the only job at time 0, and runs for 1 TU.

ps2

After A runs for 1 TU, C and B arrives. C has the highest priority, so OS preempts A and C runs for 3 TU.

ps3

Now C is done. Since A has the higher priority, A gets to continue.

ps4

When A is done, B finally gets to run.

Note that the reason high priority has small number is easy to see which has highest priority. In the sense that the largest unsigned int you wont know, depneds but smallest is 0. Makes life easier :)

Shortcomings

Lower priority processes can starve when high priority processes keep hogging the CPU or we keep having a long series of high priority processes coming in. This is worse in preemptive variants (since low priority processes get preempted)

Possible solution is to decrease the priority of processes after some time quantum, or we give current running process a fixed time quantum. No matter the priority, once TQ is up, control is handed over to the next process.

Another problem is that it is difficult to guarantee/control the exact amount of CPU time given to a process with priority based scheduling. High priority processes can hog the CPU.

Priority Inversion

Let's consider the scenario where we have processes and priorities {A:1, B:3, C:5} (where 1 is highest priority). C arrives first, runs and locks a resource. Task B arrives and preempts C, so C stops running and cannot unlock the file. Now, task A arrives, needs the same resources as C, but it is locked! Task B continues to execute although A has a higher priority.

Essentially, a low priority process locks a resource, medium priority process prevent high priority process from running.

watchdog

timer that if you don't reset, it reboots entire system to prevent hangs

How can we solve this problem? Well, we can try to promote tasks temporarily to a higher priority!

When OS notices task A is competing for resource with a lower priority process, the process (C) is temporarily promoted to the same priority as A. C can now preempt B, run to completion, release resource allowing A to start.

sempahor

Multi-Level Feedback Queue

This helps solve one big problem we faced - how do we schedule jobs without perfect knowledge? Most algorithm requires some information like how much CPU time, IO time, running time etc.

MLFQ, rather than guessing adapts. As processes run, MLFQ attempts to modify the behaviour based on the characteristic of the process. It tries to minimise response time for IO process and turnaround time for CPU bound processes.

Rules

The basic rule works as follows:

If priority(A) > priority(B) -> A runs
If priority(A) == priority(B) -> A and B runs in RR

However, it also has some special priority setting/changing rules:

When a job first comes (new job), it always runs with highest priority.
when it finishes its time quantum, is demoted (priority reduced).
If a job gives up/blocks before utilizing time quantum, priority is retianed.

illustration

Let's look at different scenarios with 3 queues (3 priority level)

single long running job.

1st tq2nd tq3rd tq

The job first runs at highest priority queue for 1 time quantum

The job gets demoted to 2nd priority level until it finishes its quantum.

The job gets demoted to the lowest level where it can run as long as it wants (to a certain limit)
1 long + 1 short job in the middle

1st tqnew jobnew job 2new job 3

The long job runs for 1 time quantum, gets demoted, runs for another time quantum, gets demoted to the lowest level

A new job comes in. Since it always comes at highest priority, it preempts the first job and consumes 1 quantum.

New job gets demoted and consumes another time quantum.

If the new job finishes here, the 1st job gets to continue.
1 CPU bound (already in system for some time) + 1 I/O bound

1st tqB comesA continuesAfter a while

A starts off at the bottom since it has been around for some time.

B then starts up, preempting A. It then makes an I/O call before its quantum finishes. Since it did not use up its quantum, it retains its priority

Control is handed back to A

As long as I/O processes do not use the full quantum, it always remains at the highest level, giving good response.

In this case, there is a potential for abusing the system. As long as we design our process to block for a short amount of time (even a sleep for 1 microsecond), we can make it always retain highest priority.

If all of our jobs are I/O bound, none gets demoted, all are high priority, so it essentially becomes round robin. So when we have a hoogeneous set of jobs, MLFQ may not give good performance.

Lottery Scheduling

The idea is that every process is given certain number of ticket from a fixed pool of \(n\) tickets. When a scheduling decision is needed, a lottery ticket is chosen at random among eligible tickets and the winner is given the resource (CPU time).

This works as although it is random, in the long run a process holding x% of tickets can win x% of the lottery held and use x% of resources (if randomness is sufficiently uniform).

The advantage of this is that it is responsive. Anytime a new process comes in, it can take part in the next lottery. It also provides a good level of control. We can control how much time a process uses by controling number of tickets given to the process.

This is important if process has a lot of child process, as it can distribute y tickets to the children. In total, the process will still consume y amount of cpu time, even if it has 1000 child process. So the children will not hog teh CPU time.

An important process can be given more tickets (control proportion of usage) and each resource can have its own set of tickets (different proportion of usage per resource per task)