I would assume that users which are familiar with asynchronous programming models, such as those used in windowing environments (X, Motif), will find it easier to grasp the concepts of multi-threaded programming.
When talking about POSIX threads, one cannot avoid the question "Which
draft of the POSIX threads standard shall be used?". As this threads standard
has been revised over a period of several years, one will find that implementations
adhering to different drafts of the standard have a different set of functions,
different default values, and different nuances. Since this tutorial was
written using a Linux system with the kernel-level LinuxThreads library,
v0.5, programmers with access to other systems, using different versions
of pthreads, should refer to their system's manuals in case of incompatibilities.
Also, since some of the example programs are using blocking system calls,
they won't work with user-level threading libraries (refer to our parallel
programming theory tutorial for more information).
Having said that, i'd try to check the example programs on other systems
as well (Solaris 2.5 comes to mind), to make it more "cross-platform".
The advantage of using a thread group instead of a normal serial program is that several operations may be carried out in parallel, and thus events can be handled immediately as they arrive (for example, if we have one thread handling a user interface, and another thread handling database queries, we can execute a heavy query requested by the user, and still respond to user input while the query is executed).
The advantage of using a thread group over using a process group is that context switching between threads is much faster then context switching between processes (context switching means that the system switches from running one thread or process, to running another thread or process). Also, communications between two threads is usually faster and easier to implement then communications between two processes.
On the other hand, because threads in a group all use the same memory
space, if one of them corrupts the contents of its memory, other threads
might suffer as well. With processes, the operating system normally protects
processes from one another, and thus if one corrupts its own memory space,
other processes won't suffer. Another advantage of using processes is that
they can run on different machines, while all the threads have to run on
the same machine (at least normally).
#include <stdio.h> /* standard I/O routines */
#include <pthread.h> /* pthread functions and data structures */
/* function to be executed by the new thread */
void*
do_loop(void* data)
{
int i;
int i; /* counter, to print numbers */
int j; /* counter, for delay */
int me = *((int*)data); /* thread identifying number */
for (i=0; i<10; i++) {
for (j=0; j<500000; j++) /* delay loop */
;
printf("'%d' - Got '%d'\n", me, i);
}
/* terminate the thread */
pthread_exit(NULL);
}
/* like any C program, program's execution begins in main */
int
main(int argc, char* argv[])
{
int thr_id; /* thread ID for the newly created thread */
pthread_t p_thread; /* thread's structure */
int a = 1; /* thread 1 identifying number */
int b = 2; /* thread 2 identifying number */
/* create a new thread that will execute 'do_loop()' */
thr_id = pthread_create(&p_thread, NULL, do_loop, (void*)&a);
/* run 'do_loop()' in the main thread as well */
do_loop((void*)&b);
/* NOT REACHED */
return 0;
}
A few notes should be mentioned about this program:
gcc pthread_create.c -o pthread_create -lpthread
The source code for this program may be found in the pthread_create.c
file.
For instance, consider the case where two threads try to update two
variables. One tries to set both to 0, and the other tries to set both
to 1. If both threads would try to do that at the same time, we might get
with a situation where one variable contains 1, and one contains 0. This
is because a context-switch (we already know what this is by now, right?)
might occur after the first tread zeroed out the first variable, then the
second thread would set both variables to 1, and when the first thread
resumes operation, it will zero out the second variable, thus getting the
first variable set to '1', and the second set to '0'.
lock mutex 'X1'. set first variable to '0'. set second variable to '0'. unlock mutex 'X1'.
Meanwhile, the second thread will do something like this:
lock mutex 'X1'. set first variable to '1'. set second variable to '1'. unlock mutex 'X1'.
Assuming both threads use the same mutex, we are assured that after they both ran through this code, either both variables are set to '0', or both are set to '1'. You'd note this requires some work from the programmer - If a third thread was to access these variables via some code that does not use this mutex, it still might mess up the variable's contents. Thus, it is important to enclose all the code that accesses these variables in a small set of functions, and always use only these functions to access these variables.
pthread_mutex_t a_mutex = PTHREAD_MUTEX_INITIALIZER;
One note should be made here: This type of initialization creates a mutex called 'fast mutex'. This means that if a thread locks the mutex and then tries to lock it again, it'll get stuck - it will be in a deadlock.
There is another type of mutex, called 'recursive mutex', which allows
the thread that locked it, to lock it several more times, without getting
blocked (but other threads that try to lock the mutex now will get blocked).
If the thread then unlocks the mutex, it'll still be locked, until it is
unlocked the same amount of times as it was locked. This is similar to
the way modern door locks work - if you turned it twice clockwise to lock
it, you need to turn it twice counter-clockwise to unlock it. This kind
of mutex can be created by assigning the constant PTHREAD_RECURSIVE_MUTEX_INITIALIZER_NP
to a mutex variable.
int rc = pthread_mutex_lock(&a_mutex);
if (rc) { /* an error has occurred */
perror("pthread_mutex_lock");
pthread_exit(NULL);
}
/* mutex is now locked - do your stuff. */
.
.
After the thread did what it had to (change variables or data structures, handle file, or whatever it intended to do), it should free the mutex, using the pthread_mutex_unlock() function, like this:
rc = pthread_mutex_unlock(&a_mutex);
if (rc) {
perror("pthread_mutex_unlock");
pthread_exit(NULL);
}
rc = pthread_mutex_destroy(&a_mutex);
After this call, this variable (a_mutex) may not be used as a mutex any more, unless it is initialized again. Thus, if one destroys a mutex too early, and another thread tries to lock or unlock it, that thread will get a EINVAL error code from the lock or unlock function.
The programs themselves are in the files accompanying this tutorial.
The one that uses a mutex is employee-with-mutex.c.
The one that does not use a mutex is employee-without-mutex.c.
Read the comments inside the source files to get a better understanding
of how they work.
The pthread library might, however, figure out a "deadlock". A deadlock
is a situation in which a set of threads are all waiting for resources
taken by other threads, all in the same set. Naturally, if all threads
are blocked waiting for a mutex, none of them will ever come back to life
again. The pthread library keeps track of such situations, and thus would
fail the last thread trying to call pthread_mutex_lock(), with
an error of type EDEADLK. The programmer should check for such
a value, and take steps to solve the deadlock somehow.
Note that a condition variable does not provide locking. Thus, a mutex
is used along with the condition variable, to provide the necessary locking
when accessing this condition variable.
pthread_cond_t got_request = PTHREAD_COND_INITIALIZER;
This defines a condition variable named 'got_request', and initializes it.
Note: since the PTHREAD_COND_INITIALIZER is actually a structure,
it may be used to initialize a condition variable only when it is declared.
In order to initialize it during runtime, one must use the pthread_cond_init()
function.
int rc = pthread_cond_signal(&got_request);
Or by using the broadcast function:
int rc = pthread_cond_broadcast(&got_request);
When either function returns, 'rc' is set to 0 on success, and to a non-zero value on failure. In such a case (failure), the return value denotes the error that occured (EINVAL denotes that the given parameter is not a condition variable. ENOMEM denotes that the system has run out of memory.
Note: success of a signaling operation does not mean any thread was
awakened - it might be that no thread was waiting on the condition variable,
and thus the signaling does nothing (i.e. the signal is lost).
It is also not remembered for future use - if after the signaling
function returns another thread starts waiting on this condition variable,
a further signal is required to wake it up.
Here is how to use these two functions. We make the assumption that 'got_request' is a properly initialized condition variable, and that 'request_mutex' is a properly initialized mutex. First, we try the pthread_cond_wait() function:
/* first, lock the mutex */
int rc = pthread_mutex_lock(&a_mutex);
if (rc) { /* an error has occurred */
perror("pthread_mutex_lock");
pthread_exit(NULL);
}
/* mutex is now locked - wait on the condition variable. */
/* During the execution of pthread_cond_wait, the mutex is unlocked. */
rc = pthread_cond_wait(&got_request, &request_mutex);
if (rc == 0) { /* we were awakened due to the cond. variable being signaled */
/* The mutex is now locked again by pthread_cond_wait() */
/* do your stuff... */
.
}
/* finally, unlock the mutex */
pthread_mutex_unlock(&request_mutex);
Now an example using the pthread_cond_timedwait() function:
#include <sys/time.h> /* struct timeval definition */
#include <unistd.h> /* declaration of gettimeofday() */
struct timeval now; /* time when we started waiting */
struct timespec timeout; /* timeout value for the wait function */
int done; /* are we done waiting? */
/* first, lock the mutex */
int rc = pthread_mutex_lock(&a_mutex);
if (rc) { /* an error has occurred */
perror("pthread_mutex_lock");
pthread_exit(NULL);
}
/* mutex is now locked */
/* get current time */
gettimeofday(&now);
/* prepare timeout value */
timeout.tv_sec = now.tv_sec + 5
timeout.tv_nsec = now.tv_usec * 1000; /* timeval uses microseconds. */
/* timespec uses nanoseconds. */
/* 1 nanosecond = 1000 micro seconds. */
/* wait on the condition variable. */
/* we use a loop, since a Unix signal might stop the wait before the timeout */
done = 0;
while (!done) {
/* remember that pthread_cond_timedwait() unlocks the mutex on entrance */
rc = pthread_cond_timedwait(&got_request, &request_mutex, &timeout);
switch(rc) {
case 0: /* we were awakened due to the cond. variable being signaled */
/* the mutex was now locked again by pthread_cond_timedwait. */
/* do your stuff here... */
.
.
done = 0;
break;
case ETIMEDOUT: /* our time is up */
done = 0;
break;
default: /* some error occurred (e.g. we got a Unix signal) */
break; /* break this switch, but re-do the while loop. */
}
}
/* finally, unlock the mutex */
pthread_mutex_unlock(&request_mutex);
As you can see, the timed wait version is way more complex, and thus better be wrapped up by some function, rather then being re-coded in every necessary location.
Note: it might be that a condition variable that has 2 or more threads waiting on it is signaled many times, and yet one of the threads waiting on it never awakened. This is because we are not guaranteed which of the waiting threads is awakened when the variable is signaled. It might be that the awakened thread quickly comes back to waiting on the condition variables, and gets awakened again when the variable is signaled again, and so on. The situation for the un-awakened thread is called 'starvation'. It is up to the programmer to make sure this situation does not occur if it implies bad behavior. Yet, in our server example from before, this situation might indicate requests are coming in a very slow pace, and thus perhaps we have too many threads waiting to service requests. In this case, this situation is actually good, as it means every request is handled immediately when it arrives.
Note 2: when the mutex is being broadcast (using pthread_cond_broadcast),
this does not mean all threads are running together. Each of them tries
to lock the mutex again before returning from their wait function, and
thus they'll start running one by one, each one locking the mutex, doing
their work, and freeing the mutex before the next thread gets its chance
to run.
int rc = pthread_cond_destroy(&got_request);
if (rc == EBUSY) { /* some thread is still waiting on this condition variable */
/* handle this case here... */
.
.
}
What if some thread is still waiting on this variable? depending on the case, it might imply some flaw in the usage of this variable, or just lack of proper thread cleanup code. It is probably good to alert the programmer, at least during debug phase of the program, of such a case. It might mean nothing, but it might be significant.
However, what if all threads are busy handling previous requests, when a new one arrives? the signaling of the condition variable will do nothing (since all threads are busy doing other things, NOT waiting on the condition variable now), and after all threads finish handling their current request, they come back to wait on the variable, which won't necessarily be signaled again (for example, if no new requests arrive). Thus, there is at least one request pending, while all handling threads are blocked, waiting for a signal.
In order to overcome this problem, we may set some integer variable
to denote the number of pending requests, and have each thread check the
value of this variable before waiting on the variable. If this variable's
value is positive, some request is pending, and the thread should go and
handle it, instead of going to sleep. Further more, a thread that handled
a request, should reduce the value of this variable by one, to make the
count correct.
Lets see how this affects the waiting code we have seen above.
/* number of pending requests, initially none */
int num_requests = 0;
.
.
/* first, lock the mutex */
int rc = pthread_mutex_lock(&a_mutex);
if (rc) { /* an error has occurred */
perror("pthread_mutex_lock");
pthread_exit(NULL);
}
/* mutex is now locked - wait on the condition variable */
/* if there are no requests to be handled. */
rc = 0;
if (num_requests == 0)
rc = pthread_cond_wait(&got_request, &request_mutex);
if (num_requests > 0 && rc == 0) { /* we have a request pending */
/* do your stuff... */
.
.
/* decrease count of pending requests */
num_requests--;
}
}
/* finally, unlock the mutex */
pthread_mutex_unlock(&request_mutex);
The program source is available in the file thread-pool-server.c, and contains many comments. Please read the source file first, and then read the following clarifying notes.
In multi-threaded programs, we also might find a need for such variables. We should note, however, that the same variable is accessible from all the threads, so we need to protect access to it using a mutex, which is extra overhead. Further more, we sometimes need to have a variable that is 'global', but only for a specific thread. Or the same 'global' variable should have different values in different threads. For example, consider a program that needs to have one globally accessible linked list in each thread, but note the same list. Further, we want the same code to be executed by all threads. In this case, the global pointer to the start of the list should be point to a different address in each thread.
In order to have such a pointer, we need a mechanism that enables
the same global variable to have a different location in memory. This is
what the thread-specific data mechanism is used for.
/* rc is used to contain return values of pthread functions */ int rc; /* define a variable to hold the key, once created. */ pthread_key_t list_key; /* cleanup_list is a function that can clean up some data */ /* it is specific to our program, not to TSD */ extern void* cleanup_list(void*); /* create the key, supplying a function that'll be invoked when it's deleted. */ rc = pthread_key_create(&list_key, cleanup_list);
Some notes:
/* this variable will be used to store return codes of pthread functions */
int rc;
/* define a variable into which we'll store some data */
/* for example, and integer. */
int* p_num = (int*)malloc(sizeof(int));
if (!p_num) {
fprintf(stderr, "malloc: out of memory\n";
exit(1);
}
/* initialize our variable to some value */
(*p_num) = 4;
/* now lets store this value in our TSD key. */
/* note that we don't store 'p_num' in our key. */
/* we store the value that p_num points to. */
rc = pthread_setspecific(a_key, (void*)p_num);
.
.
/* and somewhere later in our code... */
.
.
/* get the value of key 'a_key' and print it. */
{
int* p_keyval = (int*)pthread_getspecific(a_key);
if (p_keyval != NULL) {
printf("value of 'a_key' is: %d\n", *p_keyval);
}
}
Note that if we set the value of the key in one thread, and try to get it in another thread, we will get a NULL, since this value is distinct for each thread.
Note also that there are two cases where pthread_getspecific() might return NULL:
Using this function is simple. Assuming list_key is a pthread_key_t variable pointing to a properly created key, use this function like this:
int rc = pthread_key_delete(key);
the function will return 0 on success, or EINVAL if the supplied
variable does not point to a valid TSD key.
pthread_cancel(thr_id);
The pthread_cancel() function returns 0, so we cannot know if it succeeded or not.
int old_cancel_state; pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, &old_cancel_state);
This will disable canceling this thread. We can also enable canceling the thread like this:
int old_cancel_state; pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, &old_cancel_state);
Note that you may supply a NULL pointer as the second parameter, and then you won't get the old cancel state.
A similar function, named pthread_setcanceltype() is used to define how a thread responds to a cancellation request, assuming it is in the 'ENABLED' cancel state. One option is to handle the request immediately (asynchronously). The other is to defer the request until a cancellation point. To set the first option (asynchronous cancellation), do something like:
int old_cancel_type; pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, &old_cancel_type);
And to set the second option (deferred cancellation):
int old_cancel_type; pthread_setcanceltype(PTHREAD_CANCEL_DEFERRED, &old_cancel_type);
Note that you may supply a NULL pointer as the second parameter, and then you won't get the old cancel type.
You might wonder - "What if i never set the cancellation state or
type of a thread?". Well, in such a case, the pthread_create() function
automatically sets the thread to enabled deferred cancellation, that is,
PTHREAD_CANCEL_ENABLE for the cancel mode, and PTHREAD_CANCEL_DEFERRED
for the cancel type.
In general, any function that might suspend the execution of a thread for a long time, should be a cancellation point. In practice, this depends on the specific implementation, and how conformant it is to the relevant POSIX standard (and which version of the standard it conforms to...). The following set of pthread functions serve as cancellation points:
Note: In real conformant implementations of the pthreads standard,
normal system calls that cause the process to block, such as read(), select(),
wait() and so on, are also cancellation points. The same goes for standard
C library functions that use these system calls (the various printf functions,
for example).
Two functions are supplied for this purpose. The pthread_cleanup_push() function is used to add a cleanup function to the set of cleanup functions for the current thread. The pthread_cleanup_pop() function removes the last function added with pthread_cleanup_push(). When the thread terminates, its cleanup functions are called in the reverse order of their registration. So the the last one to be registered is the first one to be called.
When the cleanup functions are called, each one is supplied with
one parameter, that was supplied as the second parameter to the pthread_cleanup_push()
function call. Lets see how these functions may be used. In our example
we'll see how these functions may be used to clean up some memory that
our thread allocates when it starts running.
/* first, here is the cleanup function we want to register. */
/* it gets a pointer to the allocated memory, and simply frees it. */
void
cleanup_after_malloc(void* allocated_memory)
{
if (allocated_memory)
free(allocated_memory);
}
/* and here is our thread's function. */
/* we use the same function we used in our */
/* thread-pool server. */
void*
handle_requests_loop(void* data)
{
.
.
/* this variable will be used later. please read on... */
int old_cancel_type;
/* allocate some memory to hold the start time of this thread. */
/* assume MAX_TIME_LEN is a previously defined macro. */
char* start_time = (char*)malloc(MAX_TIME_LEN);
/* push our cleanup handler. */
pthread_cleanup_push(cleanup_after_malloc, (void*)start_time);
.
.
/* here we start the thread's main loop, and do whatever is desired.. */
.
.
.
/* and finally, we unregister the cleanup handler. our method may seem */
/* awkward, but please read the comments below for an explanation. */
/* put the thread in deferred cancellation mode. */
pthread_setcanceltype(PTHREAD_CANCEL_DEFERRED, &old_cancel_type);
/* supplying '1' means to execute the cleanup handler */
/* prior to unregistering it. supplying '0' would */
/* have meant not to execute it. */
pthread_cleanup_pop(1);
/* restore the thread's previous cancellation mode. */
pthread_setcanceltype(old_cancel_type, NULL);
}
As we can see, we allocated some memory here, and registered a cleanup handler that will free this memory when our thread exits. After the execution of the main loop of our thread, we unregistered the cleanup handler. This must be done in the same function that registered the cleanup handler, and in the same nesting level, since both pthread_cleanup_pop() and pthread_cleanup_pop() functions are actually macros that add a '{' symbol and a '}' symbol, respectively.
As to the reason that we used that complex piece of code to unregister
the cleanup handler, this is done to assure that our thread won't get canceled
in the middle of the execution of our cleanup handler. This could have
happened if our thread was in asynchronous cancellation mode. Thus, we
made sure it was in deferred cancellation mode, then unregistered the cleanup
handler, and finally restored whatever cancellation mode our thread was
in previously. Note that we still assume the thread cannot be canceled
in the execution of pthread_cleanup_pop() itself - this is true, since
pthread_cleanup_pop() is not a cancellation point.
For example, consider our earlier thread pool server. Looking back at the code, you'll see that we used an odd sleep() call before terminating the process. We did this since the main thread had no idea when the other threads finished processing all pending requests. We could have solved it by making the main thread run a loop of checking if no more requests are pending, but that would be a busy loop.
A cleaner way of implementing this, is by adding three changes to the code:
The last change is done using a pthread_join() loop: call pthread_join() once for each handler thread. This way, we know that only after all handler threads have exited, this loop is finished, and then we may safely terminate the process. If we didn't use this loop, we might terminate the process while one of the handler threads is still handling a request.
The modified program is available in the file named thread-pool-server-with-join.c.
Look for the word 'CHANGE' (in capital letters) to see the locations of
the three changes.
If we have a thread that we wish would exit whenever it wants without the need to join it, we should put it in the detached state. This can be done either with appropriate flags to the pthread_create() function, or by using the pthread_detach() function. We'll consider the second option in our tutorial.
The pthread_detach() function gets one parameter, of type pthread_t, that denotes the thread we wish to put in the detached state. For example, we can create a thread and immediately detach it with a code similar to this:
pthread_t a_thread; /* store the thread's structure here */
int rc; /* return value for pthread functions. */
extern void* thread_loop(void*); /* declare the thread's main function. */
/* create the new thread. */
rc = pthread_create(&a_thread, NULL, thread_loop, NULL);
/* and if that succeeded, detach the newly created thread. */
if (rc == 0) {
rc = pthread_detach(a_thread);
}
Of-course, if we wish to have a thread in the detached state immediately, using the first option (setting the detached state directly when calling pthread_create() is more efficient.
Second, we fix up the termination of the server when there are no more new requests to handle. Instead of the ugly sleep we used in our first example, this time the main thread waits for all threads to finish handling their last requests, by joining each of them using pthread_join().
The code is now being split to 4 separate files, as follows:
Exercise: our last program contains some possible race condition during its termination process. Can you see what this race is all about? Can you offer a complete solution to this problem? (hint - think of what happens to threads deleted using 'delete_handler_thread()').
Exercise 2: the way we implement the water-marks algorithm might
come up too slow on creation of new threads. Try thinking of a different
algorithm that will shorten the average time a request stays on the queue
until it gets handled. Add some code to measure this time, and experiment
until you find your "optimal pool algorithm". Note - Time should be measured
in very small units (using the getrusage system call), and several runs
of each algorithm should be made, to get more accurate measurements.
In graphical programs the problem is more severe, since the application should always be ready for a message from the windowing system telling it to repaint part of its window. If it's too busy executing some other task, its window will remain blank, which is rather ugly. In such a case, it is a good idea to have one thread handle the message loop of the windowing systm and always ready to get such repain requests (as well as user input). When ever this thread sees a need to do an operation that might take a long time to complete (say, more then 0.2 seconds in the worse case), it will delegate the job to a seperate thread.
In order to structure things better, we may use a third thread,
to control and synchronize the user-input and task-performing threads.
If the user-input thread gets any user input, it will ask the controlling
thread to handle the operation. If the task-performing thread finishes
its operation, it will ask the controlling thread to show the results to
the user.
Our main thread will launch one thread to perform the line counting, and a second thread to check for user input. After that, the main thread waits on a condition variable. When any of the threads finishes its operation, it signals this condition variable, in order to let the main thread check what happened. A global variable is used to flag whether or not a cancel request was made by the user. It is initialized to '0', but if the user-input thread receives a cancellation request (the user pressing 'e'), it sets this flag to '1', signals the condition variable, and terminates. The line-counting thread will signal the condition variable only after it finished its computation.
Before you go read the program, we should explain the use of the system() function and the 'stty' Unix command. The system() function spawns a shell in which it executes the Unix command given as a parameter. The stty Unix command is used to change terminal mode settings. We use it to switch the terminal from its default, line-buffered mode, to a character mode (also known as raw mode), so the call to getchar() in the user-input thread will return immediatly after the user presses any key. If we hadn't done so, the system will buffer all input to the program until the user presses the ENTER key. Finally, since this raw mode is not very useful (to say the least) once the program terminates and we get the shell prompt again, the user-input thread registers a cleanup function that restores the normal terminal mode, i.e. line-buffered. For more info, please refer to stty's manual page.
The program's source can be found in the file line-count.c. The name of the file whose lines it reads is hardcoded to 'very_large_data_file'. You should create a file with this name in the program's directory (large enough for the operation to take enough time). Alternatively, you may un-compress the file 'very_large_data_file.Z' found in this directory, using the command:
uncompress very_large_data_file.Z
note that this will create a 5MB(!) file named 'very_large_data_file',
so make sure you have enough free disk-space before performing this operation.
It may be possibe to use a non-MT-safe library in a multi-threaded programs in two ways: