#include <sys/types.h> #include <unistd.h> pid_t fork(void);
The fork() function is used to create a new process from an
existing process. The new process is called the child process, and the existing
process is called the parent. You can tell which is which by checking the return
value from fork(). The parent gets the child's pid returned to him,
but the child gets 0 returned to him. Thus this simple code illustrate's the
basics of it.
pid_t pid;
switch (pid = fork())
{
case -1:
/* Here pid is -1, the fork failed */
/* Some possible reasons are that you're */
/* out of process slots or virtual memory */
perror("The fork failed!");
break;
case 0:
/* pid of zero is the child */
/* Here we're the child...what should we do? */
/* ... */
/* but after doing it, we should do something like: */
_exit(0);
default:
/* pid greater than zero is parent getting the child's pid */
printf("Child's pid is %d\n",pid);
}
Of course, one can use if()... else... instead of
switch(), but the above form is a useful idiom.
Of help when doing this is knowing just what is and is not inherited by the child. This list can vary depending on Unix implementation, so take it with a grain of salt. Note that the child gets copies of these things, not the real thing.
Inherited by the child from the parent:
Unique to the child:
Some systems have a system call vfork(), which was originally
designed as a lower-overhead version of fork(). Since
fork() involved copying the entire address space of the process,
and was therefore quite expensive, the vfork() function was
introduced (in 3.0BSD).
However, since vfork() was introduced, the
implementation of fork() has improved drastically, most notably
with the introduction of `copy-on-write', where the copying of the process
address space is transparently faked by allowing both processes to refer to the
same physical memory until either of them modify it. This largely removes the
justification for vfork(); indeed, a large proportion of systems
now lack the original functionality of vfork() completely. For
compatibility, though, there may still be a vfork() call present,
that simply calls fork() without attempting to emulate all of the
vfork() semantics.
As a result, it is very unwise to actually make use of any of the
differences between fork() and vfork(). Indeed, it is
probably unwise to use vfork() at all, unless you know exactly
why you want to.
The basic difference between the two is that when a new process is created
with vfork(), the parent process is temporarily suspended, and the
child process might borrow the parent's address space. This strange state of
affairs continues until the child process either exits, or calls
execve(), at which point the parent process continues.
This means that the child process of a vfork() must be careful
to avoid unexpectedly modifying variables of the parent process. In particular,
the child process must not return from the function containing
the vfork() call, and it must not call
exit() (if it needs to exit, it should use _exit();
actually, this is also true for the child of a normal fork()).
There are a few differences between exit() and
_exit() that become significant when fork(), and
especially vfork(), is used.
The basic difference between exit() and _exit() is
that the former performs clean-up related to user-mode constructs in the
library, and calls user-supplied cleanup functions, whereas the latter performs
only the kernel cleanup for the process.
In the child branch of a fork(), it is normally incorrect to use
exit(), because that can lead to stdio buffers being flushed twice,
and temporary files being unexpectedly removed. In C++ code the situation is
worse, because destructors for static objects may be run incorrectly. (There are
some unusual cases, like daemons, where the parent should call
_exit() rather than the child; the basic rule, applicable in the
overwhelming majority of cases, is that exit() should be called
only once for each entry into main.)
In the child branch of a vfork(), the use of exit()
is even more dangerous, since it will affect the state of the parent
process.
Getting the value of an environment variable is done by using
getenv().
#include <stdlib.h> char *getenv(const char *name);
Setting the value of an environment variable is done by using
putenv().
#include <stdlib.h> int putenv(char *string);
The string passed to putenv must not be freed or made invalid, since
a pointer to it is kept by putenv(). This means that it must either
be a static buffer or allocated off the heap. The string can be freed if the
environment variable is redefined or deleted via another call to
putenv().
Remember that environment variables are inherited; each process has a separate copy of the environment. As a result, you can't change the value of an environment variable in another process, such as the shell.
Suppose you wanted to get the value for the TERM environment
variable. You would use this code:
char *envvar;
envvar=getenv("TERM");
printf("The value for the environment variable TERM is ");
if(envvar)
{
printf("%s\n",envvar);
}
else
{
printf("not set.\n");
}
Now suppose you wanted to create a new environment variable called
MYVAR, with a value of MYVAL. This is how you'd do it.
static char envbuf[256];
sprintf(envbuf,"MYVAR=%s","MYVAL");
if(putenv(envbuf))
{
printf("Sorry, putenv() couldn't find the memory for %s\n",envbuf);
/* Might exit() or something here if you can't live without it */
}
If you don't know the names of the environment variables, then the
getenv() function isn't much use. In this case, you have to dig
deeper into how the environment is stored.
A global variable, environ, holds a pointer to an array of
pointers to environment strings, each string in the form
"NAME=value". A NULL pointer is used to mark the end
of the array. Here's a trivial program to print the current environment (like
printenv):
#include <stdio.h>
extern char **environ;
int main()
{
char **ep = environ;
char *p;
while ((p = *ep++))
printf("%s\n", p);
return 0;
}
In general, the environ variable is also passed as the third,
optional, parameter to main(); that is, the above could have been
written:
#include <stdio.h>
int main(int argc, char **argv, char **envp)
{
char *p;
while ((p = *envp++))
printf("%s\n", p);
return 0;
}
However, while pretty universally supported, this method isn't actually defined by the POSIX standards. (It's also less useful, in general.)
The sleep() function, which is available on all Unixes, only
allows for a duration specified in seconds. If you want finer granularity, then
you need to look for alternatives:
usleep()
select() or poll(), specifying no
file descriptors to test; a common technique is to write a
usleep() function based on either of these (see the
comp.unix.questions FAQ for some examples)
usleep() using them (see the BSD sources for
usleep() for how to do this)
nanosleep() function
Of the above, select() is probably the most portable (and
strangely, it is often much more efficient than usleep() or an
itimer-based method). However, the behaviour may be different if signals are
caught while asleep; this may or may not be an issue depending on the
application.
Whichever route you choose, it is important to realise that you may be
constrained by the timer resolution of the system (some systems allow very short
time intervals to be specified, others have a resolution of, say, 10ms and will
round all timings to that). Also, as for sleep(), the delay you
specify is only a minimum value; after the specified period elapses,
there will be an indeterminate delay before your process next gets scheduled.
Modern Unixes tend to implement alarms using the setitimer()
function, which has a higher resolution and more options than the simple
alarm() function. One should generally assume that
alarm() and setitimer(ITIMER_REAL) may be the same
underlying timer, and accessing it both ways may cause confusion.
Itimers can be used to implement either one-shot or repeating signals; also, there are generally 3 separate timers available:
ITIMER_REAL
SIGALRM signal
ITIMER_VIRTUAL
SIGVTALRM signal
ITIMER_PROF
SIGPROF
signal; it is intended for interpreters to use for profiling. Itimers, however, are not part of many of the standards, despite having been present since 4.2BSD. The POSIX realtime extensions define some similar, but different, functions.
A parent and child can communicate through any of the normal inter-process communication schemes (pipes, sockets, message queues, shared memory), but also have some special ways to communicate that take advantage of their relationship as a parent and child.
One of the most obvious is that the parent can get the exit status of the child.
Since the child inherits file descriptors from its parent, the parent can
open both ends of a pipe, fork, then the parent close one end and the child
close the other end of the pipe. This is what happens when you call the
popen() routine to run another program from within yours, i.e. you
can write to the file descriptor returned from popen() and the
child process sees it as its stdin, or you can read from the file descriptor and
see what the program wrote to its stdout. (The mode parameter to
popen() defines which; if you want to do both, then you can do the
plumbing yourself without too much difficulty.)
Also, the child process inherits memory segments mmapped anonymously (or by mmapping the special file `/dev/zero') by the parent; these shared memory segments are not accessible from unrelated processes.
When a program forks and the child finishes before the parent, the kernel
still keeps some of its information about the child in case the parent might
need it -- for example, the parent may need to check the child's exit status. To
be able to get this information, the parent calls wait(); when this
happens, the kernel can discard the information.
In the interval between the child terminating and the parent calling
wait(), the child is said to be a `zombie'. (If you do `ps', the
child will have a `Z' in its status field to indicate this.) Even though it's
not running, it's still taking up an entry in the process table. (It consumes no
other resources, but some utilities may show bogus figures for e.g. CPU usage;
this is because some parts of the process table entry have been overlaid by
accounting info to save space.)
This is not good, as the process table has a fixed number of entries and it
is possible for the system to run out of them. Even if the system doesn't run
out, there is a limit on the number of processes each user can run, which is
usually smaller than the system's limit. This is one of the reasons why you
should always check if fork() failed, by the way!
If the parent terminates without calling wait(), the child is `adopted' by
init, which handles the work necessary to cleanup after the child.
(This is a special system program with process ID 1 -- it's actually the first
program to run after the system boots up).
You need to ensure that your parent process calls wait() (or
waitpid(), wait3(), etc.) for every child process that
terminates; or, on some systems, you can instruct the system that you are
uninterested in child exit states.
Another approach is to fork() twice, and have the
immediate child process exit straight away. This causes the grandchild process
to be orphaned, so the init process is responsible for cleaning it up. For code
to do this, see the function fork2() in the examples section.
To ignore child exit states, you need to do the following (check your system's manpages to see if this works):
struct sigaction sa;
sa.sa_handler = SIG_IGN;
#ifdef SA_NOCLDWAIT
sa.sa_flags = SA_NOCLDWAIT;
#else
sa.sa_flags = 0;
#endif
sigemptyset(&sa.sa_mask);
sigaction(SIGCHLD, &sa, NULL);
If this is successful, then the wait() functions are prevented
from working; if any of them are called, they will wait until all child
processes have terminated, then return failure with errno ==
ECHILD.
The other technique is to catch the SIGCHLD signal, and have the signal
handler call waitpid() or wait3(). See the examples
section for a complete program.
A daemon process is usually defined as a background process that does not belong to a terminal session. Many system services are performed by daemons; network services, printing etc.
Simply invoking a program in the background isn't really adequate for these long-running programs; that does not correctly detach the process from the terminal session that started it. Also, the conventional way of starting daemons is simply to issue the command manually or from an rc script; the daemon is expected to put itself into the background.
Here are the steps to become a daemon:
fork() so the parent can exit, this returns control to the
command line or shell invoking your program. This step is required so that the
new process is guaranteed not to be a process group leader. The next step,
setsid(), fails if you're a process group leader.
setsid() to become a process group and session group leader.
Since a controlling terminal is associated with a session, and this new
session has not yet acquired a controlling terminal our process now has no
controlling terminal, which is a Good Thing for daemons.
fork() again so the parent, (the session group leader), can
exit. This means that we, as a non-session group leader, can never regain a
controlling terminal.
chdir("/") to ensure that our process doesn't keep any
directory in use. Failure to do this could make it so that an administrator
couldn't unmount a filesystem, because it was our current directory.
[Equivalently, we could change to any directory containing files important to
the daemon's operation.]
umask(0) so that we have complete control over the
permissions of anything we write. We don't know what umask we may have
inherited. [This step is optional]
close() fds 0, 1, and 2. This releases the standard in, out,
and error we inherited from our parent process. We have no way of knowing
where these fds might have been redirected to. Note that many daemons use
sysconf() to determine the limit _SC_OPEN_MAX.
_SC_OPEN_MAX tells you the maximun open files/process. Then in a
loop, the daemon can close all possible file descriptors. You have to decide
if you need to do this or not. If you think that there might be
file-descriptors open you should close them, since there's a limit on number
of concurrent file descriptors.
Almost none of this is necessary (or advisable) if your daemon is being
started by inetd. In that case, stdin, stdout and stderr are all
set up for you to refer to the network connection, and the fork()s
and session manipulation should not be done (to avoid confusing
inetd). Only the chdir() and umask()
steps remain as useful.
You really don't want to do this.
The most portable way, by far, is to do popen(pscmd, "r") and
parse the output. (pscmd should be something like `"ps -ef"' on
SysV systems; on BSD systems there are many possible display options: choose
one.)
In the examples section, there are two complete versions of this; one for SunOS 4, which requires root permission to run and uses the `kvm_*' routines to read the information from kernel data structures; and another for SVR4 systems (including SunOS 5), which uses the `/proc' filesystem.
It's even easier on systems with an SVR4.2-style `/proc'; just read a psinfo_t structure from the file `/proc/PID/psinfo' for each PID of interest. However, this method, while probably the cleanest, is also perhaps the least well-supported. (On FreeBSD's `/proc', you read a semi-undocumented printable string from `/proc/PID/status'; Linux has something similar.)
Use kill() with 0 for the signal number.
There are four possible results from this call:
kill() returns 0
kill() returns @math{-1}, errno == ESRCH
kill() returns @math{-1}, errno == EPERM
kill() returns @math{-1}, with some other value of
errno
The most-used technique is to assume that success or failure with
EPERM implies that the process exists, and any other error implies
that it doesn't.
An alternative exists, if you are writing specifically for a system (or all those systems) that provide a `/proc' filesystem: checking for the existence of `/proc/PID' may work.
The return value of
system(),pclose(), orwaitpid()doesn't seem to be the exit value of my process... or the exit value is shifted left 8 bits... what's the deal?
The man page is right, and so are you! If you read the man page for
waitpid() you'll find that the return code for the process is
encoded. The value returned by the process is normally in the top 16 bits, and
the rest is used for other things. You can't rely on this though, not if you
want to be portable, so the suggestion is that you use the macros provided.
These are usually documented under wait() or wstat.
Macros defined for the purpose (in `<sys/wait.h>') include
(stat is the value returned by waitpid()):
WIFEXITED(stat)
WEXITSTATUS(stat)
WIFSIGNALED(stat)
WTERMSIG(stat)
WIFSTOPPED(stat)
WSTOPSIG(stat)
WIFCONTINUED(stat)
WCOREDUMP(stat)
WIFSIGNALED(stat) is non-zero, this is non-zero if the
process left behind a core dump. Look at getrusage(), if available.
When you free memory back to the heap with free(), on almost all
systems that doesn't reduce the memory usage of your program. The
memory free()d is still part of the process' address space, and
will be used to satisfy future malloc() requests.
If you really need to free memory back to the system, look at using
mmap() to allocate private anonymous mappings. When these are
unmapped, the memory really is released back to the system. Certain
implementations of malloc() (e.g. in the GNU C Library)
automatically use mmap() where available to perform large
allocations; these blocks are then returned to the system on
free().
Of course, if your program increases in size when you think it shouldn't, you may have a `memory leak' -- a bug in your program that results in unused memory not being freed.
On BSDish systems, the ps program actually looks into the
address space of the running process to find the current argv[],
and displays that. That enables a program to change its `name' simply by
modifying argv[].
On SysVish systems, the command name and usually the first 80 bytes of the
parameters are stored in the process' u-area, and so can't be directly modified.
There may be a system call to change this (unlikely), but otherwise the only way
is to perform an exec(), or write into kernel memory (dangerous,
and only possible if running as root).
Some systems (notably Solaris) may have two separate versions of
ps, one in `/usr/bin/ps' with SysV behaviour, and one in
`/usr/ucb/ps' with BSD behaviour. On these systems, if you change
argv[], then the BSD version of ps will reflect the
change, and the SysV version won't.
Check to see if your system has a function setproctitle().
This would be a good candidate for a list of `Frequently Unanswered Questions', because the fact of asking the question usually means that the design of the program is flawed. :-)
You can make a `best guess' by looking at the value of argv[0].
If this contains a `/', then it is probably the absolute or
relative (to the current directory at program start) path of the executable. If
it does not, then you can mimic the shell's search of the PATH
variable, looking for the program. However, success is not guaranteed, since it
is possible to invoke programs with arbitrary values of argv[0],
and in any case the executable may have been renamed or deleted since it was
started.
If all you want is to be able to print an appropriate invocation name with
error messages, then the best approach is to have main() save the
value of argv[0] in a global variable for use by the entire
program. While there is no guarantee whatsoever that the value in
argv[0] will be meaningful, it is the best option available in most
circumstances.
The most common reason people ask this question is in order to locate configuration files with their program. This is considered to be bad form; directories containing executables should contain nothing except executables, and administrative requirements often make it desirable for configuration files to be located on different filesystems to executables.
A less common, but more legitimate, reason to do this is to allow the program
to call exec() on itself; this is a method used (e.g. by
some versions of sendmail) to completely reinitialise the process
(e.g. if a daemon receives a SIGHUP).
The correct directory for this usually depends on the particular flavour of
Unix you're using; `/var/opt/PACKAGE', `/usr/local/lib',
`/usr/local/etc', or any of several other possibilities. User-specific
configuration files are usually hidden `dotfiles' under $HOME (e.g.
`$HOME/.exrc').
From the point of view of a package that is expected to be usable across a range of systems, this usually implies that the location of any sitewide configuration files will be a compiled-in default, possibly using a `--prefix' option on a configure script (Autoconf scripts do this). You might wish to allow this to be overridden at runtime by an environment variable. (If you're not using a configure script, then put the default in the Makefile as a `-D' option on compiles, or put it in a `config.h' header file, or something similar.)
User-specific configuration should be either a single dotfile under
$HOME, or, if you need multiple files, a dot-subdirectory. (Files
or directories whose names start with a dot are omitted from directory listings
by default.) Avoid creating multiple entries under $HOME, because
this can get very cluttered. Again, you can allow the user to override this
location with an environment variable. Programs should always behave sensibly if
they fail to find any per-user configuration.
Because it's not supposed to.
SIGHUP is a signal that means, by convention, "the terminal line
got hung up". It has nothing to do with parent processes, and is usually
generated by the tty driver (and delivered to the foreground process group).
However, as part of the session management system, there are exactly two
cases where SIGHUP is sent on the death of a process:
SIGHUP is sent to all processes in
the foreground process group of that terminal device.
SIGHUP and SIGCONT are sent to all members of the
orphaned group. (An orphaned process group is one where no process in the
group has a parent which is part of the same session, but not the same process
group.) There isn't a fully general approach to doing this. While you can determine
the relationships between processes by parsing ps output, this is
unreliable in that it represents only a snapshot of the system.
However, if you're lauching a subprocess that might spawn further subprocesses of its own, and you want to be able to kill the entire spawned job at one go, the solution is to put the subprocess into a new process group, and kill that process group if you need to.
The preferred function for creating process groups is setpgid().
Use this if possible rather than setpgrp() because the latter
differs between systems (on some systems `setpgrp();' is equivalent
to `setpgid(0,0);', on others, setpgrp() and
setpgid() are identical).
See the job-control example in the examples section.
Putting a subprocess into its own process group has a number of effects. In particular, unless you explicitly place the new process group in the foreground, it will be treated as a background job with these consequences:
SIGTTIN if it attempts to read from
the terminal
tostop is set in the terminal modes, it will be stopped
with SIGTTOU if it attempts to write to the terminal (attempting
to change the terminal modes should also cause this, independently of the
current setting of tostop)
SIGINT or SIGQUIT) In many applications input and output will be redirected anyway, so the most
significant effect will be the lack of keyboard signals. The parent application
should arrange to catch at least SIGINT and SIGQUIT
(and preferably SIGTERM as well) and clean up any background jobs
as necessary.
Go to the first, previous, next, last section, table of contents.