We'll try to learn here the various features that the system supplies
us with in order to answer these questions. One should note that dealing
with multi-process systems takes a slightly different approach then dealing
with a single-process program - events happen to occur in parallel, debugging
is more complicated, and there's always the risk of having a bug cause
endless process creation that'll bring your system to a halt.
Of-course, with a multi-process code, we don't have conflicts of variables,
because normally the data section of each process is separate from that
of other processes (so process A that runs program P and process B that
runs the same program P, have distinct copies of the global variable 'i'
of that program), but there might be other resources that would cause a
piece of code to be non-reentrant.
#include <unistd.h> /* defines fork(), and pid_t. */
#include <sys/wait.h> /* defines the wait() system call. */
/* storage place for the pid of the child process, and its exit status. */
pid_t child_pid;
int child_status;
/* lets fork off a child process... */
child_pid = fork();
/* check what the fork() call actually did */
switch (child_pid) {
case -1: /* fork() failed */
perror("fork"); /* print a system-defined error message */
exit(1);
case 0: /* fork() succeeded, we're inside the child process */
printf("hello world\n");
exit(0); /* here the CHILD process exits, not the parent. */
default: /* fork() succeeded, we're inside the parent process */
wait(&child_status); /* wait till the child process exits */
}
/* parent's process code may continue here... */
Notes:
#include <stdio.h> /* basic I/O routines. */
#include <unistd.h> /* define fork(), etc. */
#include <sys/types.h> /* define pid_t, etc. */
#include <sys/wait.h> /* define wait(), etc. */
#include <signal.h> /* define signal(), etc. */
/* first, here is the code for the signal handler */
void catch_child(int sig_num)
{
/* when we get here, we know there's a zombie child waiting */
int child_status;
wait(&child_status);
printf("child exited.\n");
}
.
.
/* and somewhere in the main() function ... */
.
.
/* define the signal handler for the CHLD signal */
signal(SIGCHLD, catch_child);
/* and the child process forking code... */
{
int child_pid;
int i;
child_pid = fork();
switch (child_pid) {
case -1: /* fork() failed */
perror("fork");
exit(1);
case 0: /* inside child process */
printf("hello world\n");
sleep(5); /* sleep a little, so we'll have */
/* time to see what is going on */
exit(0);
default: /* inside parent process */
break;
}
/* parent process goes on, minding its own business... */
/* for example, some output... */
for (i=0; i<10; i++) {
printf("%d\n", i);
sleep(1); /* sleep for a second, so we'll have time to see the mix */
}
}
Lets examine the flow of this program a little:
The system assures us of one thing: The order in which data is written
to the pipe, is the same order as that in which data is read from the pipe.
/* first, define an array to store the two file descriptors */
int pipes[2];
/* now, create the pipe */
int rc = pipe(pipes);
if (rc == -1) { /* pipe() failed */
perror("pipe");
exit(1);
}
If the call to pipe() succeeded, a pipe will be created, pipes[0] will contain the number of its read file descriptor, and pipes[1] will contain the number of its write file descriptor.
Now that a pipe was created, it should be put to some real use. To do this, we first call fork() to create a child process, and then use the fact that the memory image of the child process is identical to the memory image of the parent process, so the pipes[] array is still defined the same way in both of them, and thus they both have the file descriptors of the pipe. Further more, since the file descriptor table is also copied during the fork, the file descriptors are still valid inside the child process.
Lets see an example of a two-process system in which one (the parent
process) reads input from the user, and sends it to the other (the child),
which then prints the data to the screen. The sending of the data is done
using the pipe, and the protocol simply states that every byte passed via
the pipe represents a single character typed by the user.
#include <stdio.h> /* standard I/O routines. */
#include <unistd.h> /* defines pipe(), amongst other things. */
/* this routine handles the work of the child process. */
void do_child(int data_pipe[]) {
int c; /* data received from the parent. */
int rc; /* return status of read(). */
/* first, close the un-needed write-part of the pipe. */
close(data_pipe[1]);
/* now enter a loop of reading data from the pipe, and printing it */
while ((rc = read(data_pipe[0], &c, 1)) > 0) {
putchar(c);
}
/* probably pipe was broken, or got EOF via the pipe. */
exit(0);
}
/* this routine handles the work of the parent process. */
void do_parent(int data_pipe[])
{
int c; /* data received from the user. */
int rc; /* return status of getchar(). */
/* first, close the un-needed read-part of the pipe. */
close(data_pipe[0]);
/* now enter a loop of read user input, and writing it to the pipe. */
while ((c = getchar()) > 0) {
/* write the character to the pipe. */
rc = write(data_pipe[1], &c, 1);
if (rc == -1) { /* write failed - notify the user and exit */
perror("Parent: write");
close(data_pipe[1]);
exit(1);
}
}
/* probably got EOF from the user. */
close(data_pipe[1]); /* close the pipe, to let the child know we're done. */
exit(0);
}
/* and the main function. */
int main(int argc, char* argv[])
{
int data_pipe[2]; /* an array to store the file descriptors of the pipe. */
int pid; /* pid of child process, or 0, as returned via fork. */
int rc; /* stores return values of various routines. */
/* first, create a pipe. */
rc = pipe(data_pipe);
if (rc == -1) {
perror("pipe");
exit(1);
}
/* now fork off a child process, and set their handling routines. */
pid = fork();
switch (pid) {
case -1: /* fork failed. */
perror("fork");
exit(1);
case 0: /* inside child process. */
do_child(data_pipe);
/* NOT REACHED */
default: /* inside parent process. */
do_parent(data_pipe);
/* NOT REACHED */
}
return 0; /* NOT REACHED */
}
As we can see, the child process closed the write-end of the pipe (since it only needs to read from the pipe), while the parent process closed the read-end of the pipe (since it only needs to write to the pipe). This closing of the un-needed file descriptor was done to free up a file descriptor entry from the file descriptors table of the process. It isn't necessary in a small program such as this, but since the file descriptors table is limited in size, we shouldn't waste unnecessary entries.
The complete source code for this example may be found in the file one-way-pipe.c.
#include <stdio.h> /* standard I/O routines. */
#include <unistd.h> /* defines pipe(), amongst other things. */
#include <ctype.h> /* defines isascii(), toupper(), and other */
/* character manipulation routines. */
/* function executed by the user-interacting process. */
void user_handler(int input_pipe[], int output_pipe[])
{
int c; /* user input */
int rc; /* return values of functions. */
/* first, close unnecessary file descriptors */
close(input_pipe[1]); /* we don't need to write to this pipe. */
close(output_pipe[0]); /* we don't need to read from this pipe. */
/* loop: read input, send via one pipe, read via other */
/* pipe, and write to stdout. exit on EOF from user. */
while ((c = getchar()) > 0) {
/* write to translator */
rc = write(output_pipe[1], &c, 1);
if (rc == -1) { /* write failed - notify the user and exit. */
perror("user_handler: write");
close(input_pipe[0]);
close(output_pipe[1]);
exit(1);
}
/* read back from translator */
rc = read(input_pipe[0], &c, 1);
if (rc <= 0) { /* read failed - notify user and exit. */
perror("user_handler: read");
close(input_pipe[0]);
close(output_pipe[1]);
exit(1);
}
/* print translated character to stdout. */
putchar(c);
}
/* close pipes and exit. */
close(input_pipe[0]);
close(output_pipe[1]);
exit(0);
}
/* now comes the function executed by the translator process. */
void translator(int input_pipe[], int output_pipe[])
{
int c; /* user input */
int rc; /* return values of functions. */
/* first, close unnecessary file descriptors */
close(input_pipe[1]); /* we don't need to write to this pipe. */
close(output_pipe[0]); /* we don't need to read from this pipe. */
/* enter a loop of reading from the user_handler's pipe, translating */
/* the character, and writing back to the user handler. */
while (read(input_pipe[0], &c, 1) > 0) {
/* translate any upper-case letter to lower-case. */
if (isascii(c) && isupper(c))
c = tolower(c);
/* write translated character back to user_handler. */
rc = write(output_pipe[1], &c, 1);
if (rc == -1) { /* write failed - notify user and exit. */
perror("translator: write");
close(input_pipe[0]);
close(output_pipe[1]);
exit(1);
}
}
/* close pipes and exit. */
close(input_pipe[0]);
close(output_pipe[1]);
exit(0);
}
/* and finally, the main function: spawn off two processes, */
/* and let each of them execute its function. */
int main(int argc, char* argv[])
{
/* 2 arrays to contain file descriptors, for two pipes. */
int user_to_translator[2];
int translator_to_user[2];
int pid; /* pid of child process, or 0, as returned via fork. */
int rc; /* stores return values of various routines. */
/* first, create one pipe. */
rc = pipe(user_to_translator);
if (rc == -1) {
perror("main: pipe user_to_translator");
exit(1);
}
/* then, create another pipe. */
rc = pipe(translator_to_user);
if (rc == -1) {
perror("main: pipe translator_to_user");
exit(1);
}
/* now fork off a child process, and set their handling routines. */
pid = fork();
switch (pid) {
case -1: /* fork failed. */
perror("main: fork");
exit(1);
case 0: /* inside child process. */
translator(user_to_translator, translator_to_user); /* line 'A' */
/* NOT REACHED */
default: /* inside parent process. */
user_handler(translator_to_user, user_to_translator); /* line 'B' */
/* NOT REACHED */
}
return 0; /* NOT REACHED */
}
A few notes:
translator(user_to_translator, user_to_translator); /* line 'A' */
and the code of line 'B' above to:
user_handler(translator_to_user, translator_to_user); /* line 'B' */
mknod prog_pipe p
We could also provide a full path to where we want the named pipe created. If we then type 'ls -l prog_pipe', we will see something like this:
prw-rw-r-- 1 choo choo 0 Nov 7 01:59 prog_pipeThe 'p' on the first column denotes this is a named pipe. Just like any file in the system, it has access permissions, that define which users may open the named pipe, and whether for reading, writing or both.
[choo@simey1 ~]$ finger choo Login: choo Name: guy keren Directory: /home/choo Shell: /bin/tcsh On since Fri Nov 6 15:46 (IDT) on tty6 No mail. Plan: - Breed a new type of dogs. - Water the plants during all seasons. - Finish the next tutorial on time.As you can see, the contents of the '.plan' file has been printed out.
This feature of the finger daemon may be used to create a program that tells the client how many times i was fingered. For that to work, we first create a named pipe, where the '.plan' file resides:
mknod /home/choo/.plan p
If i now try to finger myself, the output will stop before showing the 'plan' file. How so? this is because of the blocking nature of a named pipe. When the finger daemon opens my '.plan' file, there is no write process, and thus the finger daemon blocks. Thus, don't run this on a system where you expect other users to finger you often.
The second part of the trick, is compiling the named-pipe-plan.c program, and running it. note that it contains the full path to the '.plan' file, so change that to the appropriate value for your account, before compiling it. When you run the program, it gets into an endless loop of opening the named pipe in writing mode, write a message to the named pipe, close it, and sleep for a second. Look at the program's source code for more information. A sample of its output looks like this:
[choo@simey1 ~]$ finger choo Login: choo Name: guy keren Directory: /home/choo Shell: /bin/tcsh On since Fri Nov 6 15:46 (IDT) on tty6 No mail. Plan: I have been fingered 8 times todayWhen you're done playing, stop the program, and don't forget to remove the named pipe from the file system.
The fact that these resources are global to the system has two contradicting implications. On one hand, it means that if a process exits, the data it sent through a message queue, or placed in shared memory is still there, and can be collected by other processes. On the other hand, this also means that the programmer has to take care of freeing these resources, or they occupy system resources until the next reboot, or until being removed by hand.
I am going to make a statement here about these communications mechanisms, that might annoy some readers: System V IPC mechanisms are evil regarding their implementation, and should not be used unless there is a very good reason. One of the problem with these mechanism, is that one cannot use the select() (or its replacement, poll()) with them, and thus a process waiting for a message to be placed in a message queue, cannot be notified about messages coming via other resources (e.g. other message queues, pipes or sockets). In my opinion, this limitation is an oversight by the designers of these mechanisms. Had they used file descriptors to denote IPC resources (like they are used for pipes, sockets and files) life would be easier.
Another problem with System V IPC is their system-global nature. The total number of message queues that may live in the system, for example, is shared by all processes. Worse then that, the number of messages waiting in all messages queues is also limited globally. One process spewing many such messages will break all processes using message queues. The same goes for other such resources. There are various other limitations imposed by API (Application programming interface). For example, one may wait on a limited set of semaphores at the same time. If you want more then this, you have to split the waiting task, or re-design your application.
Having said that, there are still various applications where using system
V IPC (we'll call it SysV IPC, for short) will save you a large amount
of time. In these cases, you should go ahead and use these mechanism
- just handle with care.
struct ipc_perm
{
key_t key; /* key identifying the resource */
ushort uid; /* owner effective user ID and effective group ID */
ushort gid;
ushort cuid; /* creator effective user ID and effective group ID */
ushort cgid;
ushort mode; /* access modes */
ushort seq; /* sequence number */
};
These fields have the following meanings:
Running 'ipcs' will show us statistics separately for each of the three resource types (shared memory segments, semaphore arrays and message queues). For each resource type, the command will show us some statistics for each resource that exists in the system. It will show its identifier, owner, size of resources it occupies in the system, and permission flags. We may give 'ipcs' a flag to ask it to show only resources of one type ('-m' for shared Memory segments, -q for message Queues and '-s' for Semaphore arrays). We may also use 'ipcs' with the '-l' flag to see the system enforced limits on these resources, or the '-u' flag to show us usage summary. Refer to the manual page of 'ipcs' for more information.
The 'ipcrm' command accepts a resource type ('shm', 'msg' or 'sem')
and a resource ID, and removes the given resource from the system. We need
to have the proper permissions in order to delete a resource.
Lets see an example of a code that creates a private message queue:
#include <stdio.h> /* standard I/O routines. */
#include <sys/types.h> /* standard system data types. */
#include <sys/ipc.h> /* common system V IPC structures. */
#include <sys/msg.h> /* message-queue specific functions. */
/* create a private message queue, with access only to the owner. */
int queue_id = msgget(IPC_PRIVATE, 0600); /* <-- this is an octal number. */
if (queue_id == -1) {
perror("msgget");
exit(1);
}
A few notes about this code:
struct msgbuf {
long mtype; /* message type, a positive number (cannot be zero). */
char mtext[1]; /* message body array. usually larger then one byte. */
};
The message type part is rather obvious. But how do we deal with a message
text that is only 1 byte long? Well, we actually may place a much larger
text inside a message. For this, we allocate more memory for a msgbuf structure
then sizeof(struct msgbuf). Lets see how we create an "hello world"
message:
/* first, define the message string */ char* msg_text = "hello world"; /* allocate a message with enough space for length of string and */ /* one extra byte for the terminating null character. */ struct msgbuf* msg = (struct msgbuf*)malloc(sizeof(struct msgbuf) + strlen(msg_text)); /* set the message type. for example - set it to '1'. */ msg->mtype = 1; /* finally, place the "hello world" string inside the message. */ strcpy(msg->mtext, msg_text);
Few notes:
int rc = msgsnd(queue_id, msg, strlen(msg_text)+1, 0);
if (rc == -1) {
perror("msgsnd");
exit(1);
}
Note that we used a message size one larger then the length of the string,
since we're also sending the null character. msgsnd() assumes
the data in the message to be an arbitrary sequence of bytes, so it cannot
know we've got the null character there too, unless we state that explicitly.
/* prepare a message structure large enough to read our "hello world". */
struct msgbuf* recv_msg =
(struct msgbuf*)malloc(sizeof(struct msgbuf)+strlen("hello world"));
/* use msgrcv() to read the message. We agree to get any type, and thus */
/* use '0' in the message type parameter, and use no flags (0). */
int rc = msgrcv(queue_id, recv_msg, strlen("hello world")+1, 0, 0);
if (rc == -1) {
perror("msgrcv");
exit(1);
}
A few notes:
P and V semaphores
Commonly
used names are P semaphore ( acquire resource) and V semaphore ( release
resource).
(Dijkstra (1965)).
P semaphore: require a resource and if not available wait for it.
V
semaphore: signal to the operating system that resource is now free to
the other users.

WakeUp
can be V semaphore.
Block can be P semaphore.
One of the basic examples of semaphores is so called "producer/consumer problem"
Let's
suppose that we have N units free space for new products and M units occupied
space (some product
occupies the space).
If N > 0 , then a coming producer can proceed.
If M > 0 , then a coming consumer can proceed.
Only one producer or consumer is allowed go to the critical area.
#define N 100
typedef int semaphore;
semaphore mutex = 1; /* Control access to the critical section /
semaphore empty = N; /* Counts empty buffer slots */
semaphore full = 0; /* Counts full buffer slots */
producer()
{
int item;
while( TRUE) {
produce_item(&item);
P(&empty) /* Decrement empty count */
P(&mutex) /* Enter critical region */
enter_item(item); /* Put new item in buffer */
V(&mutex); /* Leave critical region */
V(&full); /* increment count of full slots */
}
}
consumer()
{
int item;
while( TRUE) {
P(&full) /* Decrement full count */
P(&mutex) /* Enter critical region */
remlove_item(item); /* Take item from buffer */
V(&mutex); /* Leave critical region */
V(&empty); /* increment count of empty slots */
consume_item(item);
}
}
Only one writer is allowed to be "inside".
N readers are allowed to be "inside".
For example updating and reading a FILE.
typedef int semaphore;
semaphore mutex = 1; /* Controls access to RC */
semaphore db = 1; /* Controls access to database */
int RC = 0; /* # of processes reading or wanting to */
/* If you implement the program directly from this
pseudo code, you must
understand that the counter RC must be in shared memory segment!
There are also other possible implementations. */
reader( )
{
while(TRUE) {
P(&mutex); /* Get access to RC */
RC = RC + 1; /* One reader more */
if (RC == 1) P(&db) /* If this is the first reader... */
V(&mutex); /* Release RC */
read_data_base();
P(&mutex); /* Get access to RC */
RC = RC - 1; /* One reader more */
if (RC == 0) V(&db) /* If this is the last reader.. /
V(&mutex); /* Release RC */
use_read_data();
}
writer( )
{
while(TRUE) {
create_data();
P(&db); /* Get access to db */
write_data_base();
V(&db); /* Release db */
}
A state diagram
of virtual processors

Two types of operations can be carried on a semaphore: wait and signal. A set operation first checks if the semaphore's value equals some number. If it does, it decreases its value and returns. If it does not, the operation blocks the calling process until the semaphore's value reaches the desired value. A signal operation increments the value of the semaphore, possibly awakening one or more processes that are waiting on the semaphore. How this mechanism can be put to practical use will be explained soon.
A semaphore set is a structure that stores a group of semaphores
together, and possibly allows the process to commit a transaction on part
or all of the semaphores in the set together. In here, a transaction means
that we are guaranteed that either all operations are done successfully,
or none is done at all. Note that a semaphore set is not a general parallel
programming concept, it's just an extra mechanism supplied by SysV IPC.
/* ID of the semaphore set. */
int sem_set_id_1;
int sem_set_id_2;
/* create a private semaphore set with one semaphore in it, */
/* with access only to the owner. */
sem_set_id_1 = semget(IPC_PRIVATE, 1, IPC_CREAT | 0600);
if (sem_set_id_1 == -1) {
perror("main: semget");
exit(1);
}
/* create a semaphore set with ID 250, three semaphores */
/* in the set, with access only to the owner. */
sem_set_id_2 = semget(250, 3, IPC_CREAT | 0600);
if (sem_set_id_2 == -1) {
perror("main: semget");
exit(1);
}
Note that in the second case, if a semaphore set with ID 250 already existed,
we would get access to the existing set, rather then a new set be created.
This works just like it worked with message queues.
/* use this to store return values of system calls. */
int rc;
/* initialize the first semaphore in our set to '3'. */
rc = semctl(sem_set_id_2, 0, SETVAL, 3);
if (rc == -1) {
perror("main: semctl");
exit(1);
}
/* initialize the second semaphore in our set to '6'. */
rc = semctl(sem_set_id_2, 1, SETVAL, 6);
if (rc == -1) {
perror("main: semctl");
exit(1);
}
/* initialize the third semaphore in our set to '0'. */
rc = semctl(sem_set_id_2, 2, SETVAL, 0);
if (rc == -1) {
perror("main: semctl");
exit(1);
}
There are one comment to be made about the way we used semctl() here. According to the manual, the last parameter for this system call should be a union of type union semun. However, since the SETVAL (set value) operation only uses the int val part of the union, we simply passed an integer to the function. The proper way to use this system call was to define a variable of this union type, and set its value appropriately, like this:
/* use this variable to pass the value to the semctl() call */
union semun sem_val;
/* initialize the first semaphore in our set to '3'. */
sem_val.val = 0;
rc = semctl(sem_set_id_2, 2, SETVAL, sem_val);
if (rc == -1) {
perror("main: semctl");
exit(1);
}
We used the first form just for simplicity. From now on, we will only use
the second form.
/* this function updates the contents of the file with the given path name. */
void update_file(char* file_path, int number)
{
/* structure for semaphore operations. */
struct sembuf sem_op;
FILE* file;
/* wait on the semaphore, unless it's value is non-negative. */
sem_op.sem_num = 0;
sem_op.sem_op = -1; /* <-- Comment 1 */
sem_op.sem_flg = 0;
semop(sem_set_id, &sem_op, 1);
/* Comment 2 */
/* we "locked" the semaphore, and are assured exclusive access to file. */
/* manipulate the file in some way. for example, write a number into it. */
file = fopen(file_path, "w");
if (file) {
fprintf(file, "%d\n", number);
fclose(file);
}
/* finally, signal the semaphore - increase its value by one. */
sem_op.sem_num = 0;
sem_op.sem_op = 1; /* <-- Comment 3 */
sem_op.sem_flg = 0;
semop(sem_set_id, &sem_op, 1);
}
This code needs some explanations, especially regarding the semantics of
the semop() calls.
To control such a printing system, we need the producers to maintain a count of the number of files waiting in the spool directory and incrementing it for every new file placed there. The consumers check this counter, and whenever it gets above zero, one of them grabs a file from the spool, and sends it to the printer. If there are no files in the spool (i.e. the counter value is zero), all consumer processes get blocked. The behavior of this counter sounds very familiar.... it is the exact same behavior of a counting semaphore.
Lets see how we can use a semaphore as a counter. We still use the same two operations on the semaphore, namely "signal" and "wait".
/* this variable will contain the semaphore set. */
int sem_set_id;
/* semaphore value, for semctl(). */
union semun sem_val;
/* structure for semaphore operations. */
struct sembuf sem_op;
/* first we create a semaphore set with a single semaphore, */
/* whose counter is initialized to '0'. */
sem_set_id = semget(IPC_PRIVATE, 1, 0600);
if (sem_set_id == -1) {
perror("semget");
exit(1);
}
sem_val.val = 0;
semctl(sem_set_id, 0, SETVAL, sem_val);
/* we now do some producing function, and then signal the */
/* semaphore, increasing its counter by one. */
.
.
sem_op.sem_num = 0;
sem_op.sem_op = 1;
sem_op.sem_flg = 0;
semop(sem_set_id, &sem_op, 1);
.
.
.
/* meanwhile, in a different process, we try to consume the */
/* resource protected (and counter) by the semaphore. */
/* we block on the semaphore, unless it's value is non-negative. */
sem_op.sem_num = 0;
sem_op.sem_op = -1;
sem_op.sem_flg = 0;
semop(sem_set_id, &sem_op, 1);
/* when we get here, it means that the semaphore's value is '0' */
/* or more, so there's something to consume. */
.
.
Note that our "wait" and "signal" operations here are just like we did
with when using the semaphore as a mutex. The only difference is in who
is doing the "wait" and the "signal". With a mutex, the same process did
both the "wait" and the "signal" (in that order). In the producer-consumer
example, one process is doing the "signal" operation, while the other is
doing the "wait" operation.
The full source code for a simple program that implements a producer-consumer
system with two processes, is found in the file sem-producer-consumer.c.
One problem might be that copying a file takes a lot of time, and thus
locking the spool directory for a long while. In order to avoid that, 3
directories will be used. One serves as a temporary place for tiny-lpr
to copy files into. One will be used as the common spool directory, and
one will be used as a temporary directory into which tiny-lpd will move
the files before printing them. By putting all 3 directories on the same
disk, we assure that files can be moved between them using the rename()
system call, in one fast operation (regardless of the file size).
With shared memory, we declare a given section in the memory as one
that will be used simultaneously by several processes. This means that
the data found in this memory section (or memory segment) will be seen
by several processes. This also means that several processes might try
to alter this memory area at the same time, and thus some method should
be used to synchronize their access to this memory area (did anyone
say "apply mutual exclusion using a semaphore" ?).
In order to achieve virtual memory, the system divides memory into small pages each of the same size. For each process, a table mapping virtual memory pages into physical memory pages is kept. When the process is scheduled for running, its memory table is loaded by the operating system, and each memory access causes a mapping (by the CPU) to a physical memory page. If the virtual memory page is not found in memory, it is looked up in swap space, and loaded from there (this operation is also called 'page in').
When the process is started, it is being allocated a memory segment to hold the runtime stack, a memory segment to hold the programs code (the code segment), and a memory area for data (the data segment). Each such segment might be composed of many memory pages. When ever the process needs to allocate more memory, new pages are being allocated for it, to enlarge its data segment.
When a process is being forked off from another process, the memory page table of the parent process is being copied to the child process, but not the pages themselves. If the child process will try to update any of these pages, then this page specifically will be copied, and then only the copy of the child process will be modified. This behavior is very efficient for processes that call fork() and immediately use the exec() system call to replace the program it runs.
What we see from all of this is that all we need in order to support
shared memory, is to some memory pages as shared, and to allow a way to
identify them. This way, one process will create a shared memory segment,
other processes will attach to them (by placing their physical address
in the process's memory pages table). From now all these processes will
access the same physical memory when accessing these pages, thus sharing
this memory area.
/* this variable is used to hold the returned segment identifier. */
int shm_id;
/* allocate a shared memory segment with size of 2048 bytes, */
/* accessible only to the current user. */
shm_id = shmget(100, 2048, IPC_CREAT | IPC_EXCL | 0600);
if (shm_id == -1) {
perror("shm>
}
If several processes try to allocate a segment using the same ID, they will all get an identifier for the same page, unless they defined IPC_EXCL in the flags to shmget(). In that case, the call will succeed only if the page did not exist before.
/* these variables are used to specify where the page is attached. */
char* shm_addr;
char* shm_addr_ro;
/* attach the given shared memory segment, at some free position */
/* that will be allocated by the system. */
shm_addr = shmat(shm_id, NULL, 0);
if (!shm_addr) { /* operation failed. */
perror("shmat: ");
exit(1);
}
/* attach the same shared memory segment again, this time in */
/* read-only mode. Any write operation to this page using this */
/* address will cause a segmentation violation (SIGSEGV) signal. */
shm_addr_ro = shmat(shm_id, NULL, SHM_RDONLY);
if (!shm_addr_ro) { /* operation failed. */
perror("shmat: ");
exit(1);
}
As you can see, a page may be attached in read-only mode, or in read-write
mode. The same page may be attached several times by the same process,
and then all the given addresses will refer to the same data. In the example
above, we can use 'shm_addr' to access the segment both for reading and
for
writing, while 'shm_addr_ro' can be used for read-only access to this page.
Attaching a segment in read-only mode makes sense if our process is not
supposed to alter this memory page, and is recommended in such cases. The
reason is that if a bug in our process causes it to corrupt its memory
image, it might corrupt the contents of the shared segment, thus causing
all other processes using this segment to possibly crush. By using a read-only
attachment, we protect the rest of the processes from a bug in our process.
Here is an example of placing data in a shared memory segment, and later on reading this data. We assume that 'shm_addr' is a character pointer, containing an address returned by a call to shmat().
/* define a structure to be used in the given shared memory segment. */
struct country {
char name[30];
char capital_city[30];
char currency[30];
int population;
};
/* define a countries array variable. */
int* countries_num;
struct country* countries;
/* create a countries index on the shared memory segment. */
countries_num = (int*) shm_addr;
*countries_num = 0;
countries = (struct country*) ((void*)shm_addr+sizeof(int));
strcpy(countries[0].capital_city, "U.S.A");
strcpy(countries[0].capital_city, "Washington");
strcpy(countries[0].currency, "U.S. Dollar");
countries[0].population = 250000000;
(*countries_num)++;
strcpy(countries[1].capital_city, "Israel");
strcpy(countries[1].capital_city, "Jerusalem");
strcpy(countries[1].currency, "New Israeli Shekel");
countries[1].population = 6000000;
(*countries_num)++;
strcpy(countries[1].capital_city, "France");
strcpy(countries[1].capital_city, "Paris");
strcpy(countries[1].currency, "Frank");
countries[1].population = 60000000;
(*countries_num)++;
/* now, print out the countries data. */
for (i=0; i < (*countries_num); i++) {
printf("Country %d:\n", i+1);
printf(" name: %s:\n", countries[i].name);
printf(" capital city: %s:\n", countries[i].capital_city);
printf(" currency: %s:\n", countries[i].currency);
printf(" population: %d:\n", countries[i].population);
}
A few notes and 'gotchas' about this code:
Since the memory page was already allocated when we called shmget(), there is no need to use malloc() when placing data in that segment. Instead, we do all memory management ourselves, by simple pointer arithmetic operations. We also need to make sure the shared segment was allocated enough memory to accommodate future growth of our data - there are no means for enlarging the size of the segment once allocated (unlike when using normal memory management - we can always move data to a new memory location using the realloc() function).
In the example above, we assumed that the page's address is aligned properly for an integer to be placed in it. If it was not, any attempt to try to alter the contents of 'countries_num' would trigger a bus error (SIGBUS) signal. further, we assumed the alignment of our structure is the same as that needed for an integer (when we placed the structures array right after the integer variable).
/* this structure is used by the shmctl() system call. */
struct shmid_ds shm_desc;
/* destroy the shared memory segment. */
if (shmctl(shm_id, IPC_RMID, &shm_desc) == -1) {
perror("main: shmctl: ");
}
Note that any process may destroy the shared memory segment, not only the
one that created it, as long as it has write permission to this segment.
To help with that, the ftok() system call was introduced. This system call accepts two parameters, a path to a file and a character, and generates a more-or-less unique identifier. It does that by finding the "i-node" number of the file (more or less the number of the disk sector containing this file's information), combines it with the second parameter, and thus generates an identifier, that can be later fed to semget, shmget() or msgget(). Here is how to use ftok():
/* identifier returned by ftok() */
key_t set_key;
/* generate a "unique" key for our set, using the */
/* directory "/usr/local/lib/ourprojectdir". */
set_key = ftok("/usr/local/lib/ourprojectdir", 'a');
if (set_key == -1) {
perror("ftok: ");
exit(1);
}
/* now we can use 'set_key' to generate a set id, for example. */
sem_set_id = semget(set_key, 1, IPC_CREAT | 0600);
.
.
One note should be taken: if we remove the file and then re-create it,
the system is very likely to allocate a new disk sector for this file,
and thus activating the same ftok call with this file will generate
a different key. Thus, the file used should be a steady file, and not one
that is likely to be moved to a different disk or erased and re-created.