Process Alternative - Threads

Processes are a way of making use of CPU and IO resources. However, there are also alternatives to processes known as threads.

Motivation

Processes are very expensive - when you create a process with fork(), it duplicates memory space(not really but ignore for now) and process context (harware (registers), memory (info about which part of mem process uses) and os(info that os requires to run process). this part is true). With multiple processes, context switch is needed where we save and restore process information (which has the 3 contexts). So every time we do a switch, all these has to be saved and restored.

Communication between process is also essentially non existent. Since each has its own memory space, global variables in each process are separate. Even if a and b are parent and child, the global variables do not share data between them. This is a problem, so we need to make use of Inter-Process Communication (IPC).

Thus, threads were invented to solve the problems with process model. The basic idea is that traditional process has a single thread of control, so each process has only 1 instruction stream executed (i.e. only 1 instruction of the whole program is executing at any time). For example, if p calls fork() to create q, each pand q are just executing 1 stream of instruction.

With threads, rather than having 1 stream of instruction for a process, within a process we now have multiple instruction streams. So we add more threads of control to the same process, where multiple parts of the programs is executing at the same time conceptually.

You may wonder, why not just do multiple fork()? Well, rather than having multiple process do 1 thing at a time, with threads we can have 1 process doing several things at a time. Single threaded processes go through functions sequentially. With fork(), we have multiple threads of control, i.e. multiple processes, each with a single thread of execution.

Process and Thread

A single process can have multiple threads aka multithreaded process. But what can we share between threads? Remember that a process is a running program that has been abstracted through the use of context. Threads in the same process share the memory context (text, data, heap) and OS context (PID, other resources like file etc.) but not hardware context, as unique information is needed for each thread (thread id to communicate between threads, registers, "stack")

Context Recap

hardware: GPR, PC, SP, FP, registers etc.
memory: information about memory requirements and uses etc
OS: information about scheduling, PID, CPU usage, files (if you are accessing them)

processThread

As seen from the diagram, in the single-threaded process, we just have a single thread of execution making use of the code, data etc. In multithreaded process, each thread would share the code, data and files but the registers and stack has to be separate. Note this is a conceptual separation of the stack, in practice it is sharing the same stretch of stack the process has.

Process Context Switch vs Thread Switch

single-threadedmultithreaded

singleT

Here, when we spawn a new process, the child has to be an exact copy of the parent. This means a duplication of the code, data and files as well as hardware context (concpetually, in practice we only duplicate variables that are altered. So they share the same memory space until one of them tries to modify a variable, then a duplicate of the variable is made aka copy-on-write). This is very inefficient.

When we context switch between processes, we still have to context switch on hardware, OS (as it is a new process) and memory (there will be some differences in memory usage between parent and child).

processThread

On the other hand with multithreading, since we share the OS and memory context, context switching between threads only involve the hardware context. It involves the registers and stack by manipulating the SP and FP.

Context switching usually involves a massive amount of reading and writing. Shuffling the data back and forth is very slow. The less we need to switch, the better. So threads are "lighter" than processes, aka lightweight process.

Benefits

Some benefits of threads include:

Economy
- Multiple threads in same process requires much less resources to manage compared to multiple processes
- With copy-on-write, in multi-process programs, writing to a large number of variable, they tend to be duplicated. For multithreaded, they will be the same so don't need to duplicate
Resource sharing
- Threads share most of the resources of a process (files, variables)
- No need for additional mechanism for passing information around
- Can pass information between threads conveniently
Responsiveness
- Multithreaded programs can appear much more responsive
- Context-switching is lighter
Scalability
Multhithreaded programs can take advantage of multiple CPUs
OS can schedule the threads on different CPUs, so process is doing multiple thing at a time

Problems

Of course, with benefits also comes some problems:

System call concurrency
- Parallel execution of multiple threads mean parallel system call is possible

State machine
f() {
    static int x = 0; // static local var retains value across fn call
    x = (x + 1) % 5;  // esentially has has properties of global var 
                      // but lexically scoped as local var
    switch (x) {
        case 0 :
        ..
        case 1:
        ..
        ..
        case 4:
        ..
    }

}

If we have thread1 calling f(), it increments x to 1. But before executing the switch, the OS switches context to thread2. Now when thread2 starts executing, x is incremented to 2 (using exact same copy of f with same static variable x). When OS context switch back to 1, it resumes at the wrong switch statement (executes case 2 instead of case 1).

This is known as the re-entrancy problem, common with multithreaded code. A code that does not suffer from re-entrancy problem is known as thread-safe.

Process behaviour
- Impact on process operations
- when we fork a process, do we duplicate the threads? do they begin executing at the same place?
- if a single thread executes exit(), does the entire process exit? (according to semantics of exit, if 1 exit it kills other threads)
- if a single thread calls exec(), the process image is replaced by the new program, but will the other threads be replaced?

Thread Models

There are 2 major ways of implementing threads - user thread and kernel thread.

User Threads

The thread is implemented as a user library where a runtime system (in the process) will handle thread related operations. The kernel is not aware of the threads in the process.

userThread

Recall our CPU is divided into user and kernel mode. In the case of user threads, all of the threads including management runtime is running within the process. Thus everything done does not require system calls, making them much more efficient. Within the kernel, the only thing we have is the process table.

Advantages

Since user threads are a user library, it can run on any OS, including those that do not support threading i.e. we can have multithreaded programs on any OS.

Thread operations are also just library calls (function call), so it only involves some code to build the call frame and a jal, followed by jr and code to demolish stack frame.

User threads are generally more conifgurable and flexible. Since threading is done on a per-process basis, we can optimise the thread library for the requirements for each proccess. If we don't need thread priority, we can configure library to remove thread priority, making the thread library smaller (so have a customized thread scheduling policy)

Disadvantages

Since the OS is unaware of the threads, it only schedules at the process level, and not at a thread level. This means that once 1 thread is blocked, the process is blocked and all threads is blocoked.

For example, if a thread tries to read from a very big file (makes a system call), the OS receives the request, and since it is unaware there are multiple threads, it assumes the read file comes from a process. Since the general assumption is that a process cannot continue until it has the file data, the OS blocks the entire process. (it does not realize the other threads can continue)

Additionally, since the OS is unaware of othere threads, it schedules the process on 1 CPU, and cannot schedule individual threads on different CPU, thus we cannot exploit multiple CPUs.

Kernel Threads

Kernel threads are impemented by the OS, where thread operations are handled as system calls. Here, thread-level scheduling is possible (since kernel is aware of the threads), so the kernel can schedule by threads instead of process. The kernel can also make use of threads for its own execution, i.e. the kernel becomes multithreaded.

kernelThread

In kernel threads, the threads are running in the processes but maanged by the kernel. The kernel not only has a process table to manage individual processes, but also has a thread table to manage individual threads.

Advantages

With the thread table, the kernel can schedule on individual threads. This means that more than 1 thead in the same processs can run simultaneously on multiple CPUs. (OS schedules each thread on different CPUs), achieving true parallelism.

Disadvantages

Now, every single thread operation is a system call, no longer a function call, making it slower and more resource intensive.Being a system call, we have to execute threads. They can take 100-1000s of cycles, making it very slow.

They are also less flexible, since they cannot be customized much (they are used for a huge range of processes)

Hybrid Thread Model

In hybrid thread model, there are both user and kernel level threads. OS schedules on kernel threads only while user threads bind to kernel threads (rather than to processes).

hybridThread

This gives us a lot of flexibility as we have the benefits of kernel-level threads (since kernel is aware, can schedule threads individually). At the same time, binding multiple user thread to kernel thread allows us to switch efficiently between the threads and user space(more efficient than context switching).

We can also limit the concurrency of a process - when creating a process, we can give it a maximum of e.g. 2 kernel threads. Within the process, it can make use of the 2 kernel threads as it wishes. At the OS level, we can limit the concurrency of each process by limiting the number of threads for each process.

Solaris

solaris

This is the solaris model. There are user level threads bound to kernel-level threads through the library. The kernel-level threads in turn are scheduled by the kernel and can be bound to different CPUs.

Modern Processors

Threads started off as a software mechanism where it was a user space library. As they became more popular and powerful, OS developers made them an OS feature. Today, there are even hardware support on modern processors - within the CPU and a single datapath, we have multiple sets of registers, allowing threads to run natively and in parallel on the same core, aka simultaneous multi-threading

POSIX Threads

The POSIX thread library is called pthread and is a standard defined by IEEE, supported by most Unix variants. It specifies the API and behaviour but not the implementation, making it possible to implementas user or kernel thread.

Syntax

To use pthread, you need #include <pthread.h> in the header files and during compilation, do gcc file_name.c -lpthread which links the code to the pthread library. Whether is is user/kernel/hybrid depends on the OS used.

2 useful datatypes inclue:

pthread_t: Data type to represent thread ID (TID)
pthread_attr: Data type to represent attributes of a thread (when configuring pthreads)

Creation

int pthread_create(
    pthread_t *tidCreated,
    const pthread_attr_t *threadAttributes,
    void* (*startRoutine) (void*),
    void *argForStartRoutine);

If thread is created successfully, returns 0 and non-0 for errors. The parameter are as follows:

tidCreated: Thread Id for created thread
theadAttributes: Control behaviour of new thread, NULL for default
startRoutine: Function pointer to function to be executed by thread (pointer to code)
argForStartRoutine: Arguments for startRoutine function

In c, if you don't know what to point to, can point to void (arbitrary pointer) then cast afterwards.

Termination

We can terminate the thread by calling void pthread_exit(void* exitValue), where exitValueis the value to be returned to whoever synchronizes with this thread. If not used, a pthread will terminate automatically when the end of startRoutine is reached, but there is no way to return an exit value.

Example
// header files not shown

void* sayHello(void* arg)
{
    // int x = (int) arg; if passing in (void*) 5
    printf("Just to say hello!\n");
    pthread_exit(NULL); // null if don't want to pass back value
}

int main()
{
    pthread_t tid;
    pthread_create(&tid, NULL, sayHello, NULL); // passing in function name passes pointer to function

    printf("Thread created with TID %i\n", tid);

    return 0;
}

Sharing memory space
//header files not shown
int globalVar; // shared across all threads

void* doSum( void* arg)
{ 
    int i ;
    for (i = 0; i < 1000; i++) // increments globalVar by 1000
        globalVar++;
}

int main()
{
    pthread_t tid[5]; // 5 threads id
    int i;
    // does not call function 5 times sequentially
    for (i = 0; i < 5; i++)
        pthread_create( &tid[i], NULL, doSum, NULL );


    //Wait for all threads to finish
    for (i = 0; i < 5; i++)
        pthread_join( tid[i], NULL );

    printf("Global variable is %i\n", globalVar);
    return 0;
}

Although we give pointer to a single function doSum, we actually create 5 separate threads of execution, each of which uses the same function call. The function is created as 5 different threads and the machine code created will read and write registers that are separate (each thread has its own hardware context)

From the program, you would expect globalVar to have a value of 5000, but in reailty we won't know, due to a problem of race condition.

pthread_join allows us to do thread synchronization, which waits for the termination of another thread. The method is int pthread_join(pthread_t threadID, void ** status) where status is a variable to receive the result when pthread exits. Note that waiting for threads does not mean globalVar will be 5000. The race condition is caused by individual threads accessing the var at the same time