struct hostent *gethostbyname(char *hostname);
The input to the function will be the name of the host whose address we want to resolve. The function returns a pointer to a structure hostent, whose definition is as follows:
struct hostent {
char* h_name; /* official name of host */
char** h_aliases; /* alias list */
int h_addrtype; /* host address type */
int h_length; /* length of address */
char** h_addr_list; /* list of addresses from name server */
#define h_addr h_addr_list[0] /* address, for backward compatibility */
};
Lets see what each field in the hostent structure means:
We normally have several functions (or macros) to form 4 types of translations:
short host_port = 1234;The other functions are used in a similar way.
net_port = htons(host_port);
struct sockaddr_in {
short int sin_family; /* Address family */
unsigned short sin_port; /* Port number */
struct in_addr sin_addr; /* Internet address */
/* Pad to size of `struct sockaddr'. */
/* Pad definition deleted */
};
The fields have the following meanings:
char* hostname; /* name part of the address */
short host_port; /* port part of the address */
struct hostent* hen; /* server's DNS entry */
struct sockaddr_in sa; /* address formation structure */
/* get information about the given host, using some the system's */
/* default name resolution mechanism (DNS, NIS, /etc/hosts...). */
hen = gethostbyname(hostname_ser);
if (!hen) {
perror("couldn't locate host entry");
}
/* create machine's Internet address structure */
/* first clear out the struct, to avoid garbage */
memset(&sa, 0, sizeof(sa));
/* Using Internet address family */
sa.sin_family = AF_INET;
/* copy port number in network byte order */
sa.sin_port = htons(host_port);
/* copy IP address into address struct */
memcpy(&sa.sin_addr.s_addr, hen->h_addr_list[0], hen->h_length);
Notes:
We will first describe what a socket is, and how it relates to normal
files, then explain what kinds of sockets exist on most Unix systems, how
they are created using the socket() system call, and finally, how they are
associated with a specific network connection, and how data is passed through
them.
Stream sockets are used for stream connections, i.e. connections that exist for a long duration. TCP connections use stream sockets.
Datagram sockets are used for short-term connections, that transfer a single packet across the network before terminating. the UDP protocol uses such sockets, due to its connection-less nature.
Raw sockets are used to access low-level protocols directly, bypassing
the higher protocols. They are the means for a programmer to use the
IP protocol, or the physical layer of the network, directly. Raw
sockets can therefor be used to implement new protocols on top of the low-level
protocols. Naturally, they are out of our scope.
int socket(int address_family, int socket_type, int proto_family);address_family defines the type of addresses we want this socket to use, and therefor defines what kind of network protocol the socket will use. We will concentrate on the Internet address family, cause we want to write Internet applications.
socket_type could be one of the socket types we mentioned earlier, or any other socket type that exists on your system. We choose the socket type according to the kind of interaction (and type or protocol) we want to use.
proto_family selects which protocol we want to socket to use. We will usually leave this value as 0 (or the constant PF_UNSPEC on some systems), and let the system choose the most suitable protocol for us. As for the protocol itself, In the Internet address family, a socket type of SOCK_STREAM will cause the protocol type to be set to TCP . A socket type of SOCK_DGRAM (Datagram socket) will cause the protocol type to be set to UDP.
The socket system call returns a file descriptor which will be used to reference the socket in later requests by the application program. If the call fails, however (due to lack of resources) the value returned will be negative (note that file descriptors have to be non-negative integers).
As an example, suppose that we want to write a TCP
application. This application needs at least one socket in order to communicate
across the Internet, so it will contain a call such as this:
int s; /* descriptor of socket */
/* Internet address family, Stream socket */
s = socket(AF_INET, SOCK_STREAM, 0);
if (s < 0) {
perror("socket: allocation failed");
}
Binding to a local address could be done either explicitly, using the bind() system call, or implicitly, when a connecting is established. Binding to the remote address is done only when a connection is established. To bind a socket to a local address, we use the bind() system call, which is defined as follows:
int bind(int socket, struct sockaddr *address, int addrlen);Note the usage of a different type of structure, namely struct sockaddr , then the one we used earlier (struct sockaddr_in). Why is the sudden change? This is due to the generality of the socket interface: sockets could be used as endpoints for connections using different types of address families. Each address family needs different information, so they use different structures to form their addresses. Therefore, a generic socket address type, struct sockaddr, is defined in the system, and for each address family, a different variation of this structure is used. For those who know, this means that struct sockaddr_in, for example, is an overlay of struct sockaddr (i.e. it uses the same memory space, just divides it differently into fields).
There are 4 possible variations of address binding that might be used when binding a socket in the Internet address family.
The first is binding the socket to a specific address, i.e. a specific IP number and a specific port. This is done when we know exactly where we want to receive messages. Actually this form is not used in simple servers, since usually these servers wish to accept connections to the machine, no matter which IP interface it came from.
The second form is binding the socket to a specific IP number, but letting the system choose an unused port number. This could be done when we don't need to use a well-known port.
The third form is binding the socket to a wild-card address called INADDR_ANY (by assigning it to the sockaddr_in variable), and to a specific port number. This is used in servers that are supposed to accept packets sent to this port on the local host, regardless of through which physical network interface the packet has arrived (remember that a host might have more then one IP address).
The last form is letting the system bind the socket to any local IP address
and to pick a port number by itself. This is done by not using the bind()
system call on the socket. The system will make the local bind when a connection
through the socket is established, i.e. along with the remote address binding.
This form of binding is usually used by clients, which care only about the
remote address (where they connect to) and don't need any specific local port
or local IP address. However, there are exceptions here too.
int read(int socket, char *buffer, int buflen);
int write(int socket, char *buffer, int buflen);
int close(int socket);
We will begin by showing the C code of a simple Client without user-interaction. This Client connects to the standard time server of a given host, reads the time, and prints it on the screen. Most (Unix) Internet hosts have a standard server called daytime, that awaits connections on the well-known port number 13, and when it receives a connection request, accepts it, writes the time to the Client, and closes the connection.
Lets see how the Client looks. Note the usage of a new system call,
connect(), which is used to establish a connection to a remote machine,
and will be further explained immediately following the program text.
#include <stdio.h> /* Basic I/O routines */
#include <sys/types.h> /* standard system types */
#include <netinet/in.h> /* Internet address structures */
#include <sys/socket.h> /* socket interface functions */
#include <netdb.h> /* host to IP resolution */
#define HOSTNAMELEN 40 /* maximal host name length */
#define BUFLEN 1024 /* maximum response size */
#define PORT 13 /* port of daytime server */
int main(int argc, char *argv[])
{
int rc; /* system calls return value storage */
int s; /* socket descriptor */
char buf[BUFLEN+1]; /* buffer server answer */
char* pc; /* pointer into the buffer */
struct sockaddr_in sa; /* Internet address struct */
struct hostent* hen; /* host-to-IP translation */
/* check there are enough parameters */
if (argc < 2) {
fprintf(stderr, "Missing host name\n");
exit (1);
}
/* Address resolution stage */
hen = gethostbyname(argv[1]);
if (!hen) {
perror("couldn't resolve host name");
}
/* initiate machine's Internet address structure */
/* first clear out the struct, to avoid garbage */
memset(&sa, 0, sizeof(sa));
/* Using Internet address family */
sa.sin_family = AF_INET;
/* copy port number in network byte order */
sa.sin_port = htons(PORT);
/* copy IP address into address struct */
memcpy(&sa.sin_addr.s_addr, hen->h_addr_list[0], hen->h_length);
/* allocate a free socket */
/* Internet address family, Stream socket */
s = socket(AF_INET, SOCK_STREAM, 0);
if (s < 0) {
perror("socket: allocation failed");
}
/* now connect to the remote server. the system will */
/* use the 4th binding method (see section 3) */
/* note the cast to a struct sockaddr pointer of the */
/* address of variable sa. */
rc = connect(s, (struct sockaddr *)&sa, sizeof(sa));
/* check there was no error */
if (rc) {
perror("connect");
}
/* now that we are connected, start reading the socket */
/* till read() returns 0, meaning the server closed */
/* the connection. */
pc = buf;
while (rc = read(s, pc, BUFLEN - (pc-buf))) {
pc += rc;
}
/* close the socket */
close(s);
/* pad a null character to the end of the result */
*pc = ' ';
/* print the result */
printf("Time: %s\n", buf);
/* and terminate */
return 0;
}
The complete source code for this client may be found in the daytime-client.c file.
The Client's code should be pretty easy to understand now. All we did was combine the features we have seen so far into one program. The only new feature introduced here is the connect() system call.
This system call is responsible to making the connection to the specified address of the remote machine, using the specified socket. Note that the address is being type-cast into the general address type, struct sockaddr , because this same system call is used to establish connections in various address families, not just the Internet address family. How will the system then know we want an Internet connection? The answer is given in the socket's information. If you remember, we specified this socket will be used in the Internet address family (AF_INET) when we created it.
Note also how the reading loop is performed. We are asking the system to read as much data as possible in the read() system call. However, the system might need several reads before it has consumed all the bytes sent by the server, that's why we used the while loop. Remember, never assume a read() system call will return the exact number of bytes you specified in the call. If less is available, the call will return quickly, and will not wait for the rest of the data. On the other hand, if no data is available, the call will block (not return) until data is available. Thus, when writing "Real" Clients and Servers, some measures have to be taken in order to avoid that blocking.
We will not discuss right now Clients that read user input. This subject
will be differed until we learn how to read information efficiently from
several input devices.
The "hello world" Server listens to a predefined port of our choice, and accepts incoming connections. It then writes the message "hello world" to the remote Client, and closes the connection. This will be done in an infinite loop, so we can serve a new Client after finishing with the current.
Note the introduction of two new system calls, listen() and
accept(). The listen() system call asks the system to listen
for new connections coming to our port. The accept() system call
is used to accept (how obvious) such incoming connections. Both system calls
will be explained further following the "hello world" Server's code.
#include <stdio.h> /* Basic I/O routines */
#include <sys/types.h> /* standard system types */
#include <netinet/in.h> /* Internet address structures */
#include <sys/socket.h> /* socket interface functions */
#include <netdb.h> /* host to IP resolution */
#define PORT 5050 /* port of "hello world" server */
#define LINE "hello world" /* what to say to our clients */
void main()
{
int rc; /* system calls return value storage */
int s; /* socket descriptor */
int cs; /* new connection's socket descriptor */
struct sockaddr_in sa; /* Internet address struct */
struct sockaddr_in csa; /* client's address struct */
int size_csa; /* size of client's address struct */
/* initiate machine's Internet address structure */
/* first clear out the struct, to avoid garbage */
memset(&sa, 0, sizeof(sa));
/* Using Internet address family */
sa.sin_family = AF_INET;
/* copy port number in network byte order */
sa.sin_port = htons(PORT);
/* we will accept connections coming through any IP */
/* address that belongs to our host, using the */
/* INADDR_ANY wild-card. */
sa.sin_addr.s_addr = INADDR_ANY;
/* allocate a free socket */
/* Internet address family, Stream socket */
s = socket(AF_INET, SOCK_STREAM, 0);
if (s < 0) {
perror("socket: allocation failed");
}
/* bind the socket to the newly formed address */
rc = bind(s, (struct sockaddr *)&sa, sizeof(sa));
/* check there was no error */
if (rc) {
perror("bind");
}
/* ask the system to listen for incoming connections */
/* to the address we just bound. specify that up to */
/* 5 pending connection requests will be queued by the */
/* system, if we are not directly awaiting them using */
/* the accept() system call, when they arrive. */
rc = listen(s, 5);
/* check there was no error */
if (rc) {
perror("listen");
}
/* remember size for later usage */
size_csa = sizeof(csa);
/* enter an accept-write-close infinite loop */
while (1) {
/* the accept() system call will wait for a */
/* connection, and when one is established, a */
/* new socket will be created to handle it, and */
/* the csa variable will hold the address */
/* of the Client that just connected to us. */
/* the old socket, s, will still be available */
/* for future accept() statements. */
cs = accept(s, (struct sockaddr *)&csa, &size_csa);
/* check for errors. if any, enter accept mode again */
if (cs < 0)
continue;
/* oak, we got a new connection. do the job... */
write(cs, LINE, sizeof(LINE));
/* now close the connection */
close(cs);
}
}
The complete source code for this server may be found in the hello-world-server.c file.
Look how little we had to add to the basic stuff in order to form our first server. The only two additions were the listen() and the accept() system calls. Lets examine them a little more.
If we want to serve incoming connections, we need to ask the system to listen on the specified port. If we don't do that, the remote Client will get a "connection refused" error. Once the system listens on the port, It could happen that more then one Client will ask for service simultaneously. We can tell the system how many Clients may "wait in line". This will be the second parameter to the listen() system call.
After issuing the listen() system call, we still need to actively accept incoming connections. This is done using the accept() system call. We tell it which socket is bound to the port we want to accept connection from, and give it the address of a variable in which the call will give us the address of the remote Client, once a connection is established. It will also update the size of the address, based on the address family used, in the variable whose address we pass as the third argument. We are not using the Client's address in our simple server, but other servers that might want to authenticate their Clients (or just to know where they are coming from), will use it.
Finally, the accept() system call returns a number of a new socket,
which is allocated for the new established connection. This gives us a socket
bound to the correct local and remote addresses, while not destroying the
binding of the original socket, that we can later use to accept new connections.
The first approach is using one process that awaits new connections, and one more process (or thread) for each Client already connected. This approach makes design quite easy, cause then the main process does not need to differ between servers, and the sub-processes are each a single-Client server process, hence, easier to implement.
However, this approach wastes too many system resources (if child processes are used), and complicates inter-Client communication: If one Client wants to send a message to another through the server, this will require communication between two processes on the server, or locking mechanisms, if using multiple threads.
The second approach is using a single process for all tasks: waiting for new connections and accepting them, while handling open connections and messages that arrive through them. This approach uses less system resources, and simplifies inter-Client communication, although making the server process more complex.
Luckily, the Unix system provides a system call that makes these tasks
much easier to handle: the select() system call.
The select() system call puts the process to sleep until any of a given list of file descriptors (including sockets) is ready for reading, writing or is in an exceptional condition. When one of these things happen, the call returns, and notifies the process which file descriptors are waiting for service.
The select system call is defined as follows:
int select(int numfds,
fd_set *rfd,
fd_set *wfd,
fd_set *efd,
struct timeval *timeout);
We give select() 3 sets of file descriptors to check upon. The sockets in the rfd set will be checked whether they sent data that can be read. The file descriptors in the wfd set will be checked to see whether we can write into any of them. The file descriptors in the efd set will be checked for exceptional conditions (you may safely ignore this set for now, since it requires a better understanding of the Internet protocols in order to be useful). Note that if we don't want to check one of the sets, we send a NULL pointer instead.
We also give select() a timeout value - if this amount of time passes before any of the file descriptors is ready, the call will terminate, returning 0 (no file descriptors are ready).
NOTE - We could use the select() system call to modify the Client so it could also accept user input, Simply by telling it to select() on a set comprised of two descriptors: the standard input descriptor (descriptor number 0) and the communication socket (the one we allocated using the socket() system call). When the select() call returns, we will check which descriptor is ready: standard input, or our socket, and this way will know which of them needs service.
There are three more things we need to know in order to be able to use select. One - how do we know the highest number of a file descriptor a process may use on our system? Two - how do we prepare those sets? Three - when select returns, how do we know which descriptors are ready - and what they are ready for?
As for the first issue, we could use the getdtablesize() system call. It is defined as follows:
int getdtablesize();
This system call takes no arguments, and returns the number of the largest file descriptor a process may have. On modern systems, we could instead use the getrlimit() system call, using the RLIMIT_NOFILE parameter. Refer to the relevant manual page for more information.
As for the second issue, the system provides us with several macros to manipulate fd_set type variables.
Here is the source code of a Multi-Client echo Server.
This Server accepts connection from several Clients simultaneously, and echoes
back at each Client any byte it will send to the Server. This is a service
similar to the one give by the Internet Echo service, that accepts incoming
connections on the well-known port 7. Compare the code given here to the algorithm
of a Multi-Client Server presented in the Client-Server model section.
#include <stdio.h> /* Basic I/O routines */The complete source code for this server may be found in the multi-client-echo-server.c file.
#include <sys/types.h> /* standard system types */
#include <netinet/in.h> /* Internet address structures */
#include <sys/socket.h> /* socket interface functions */
#include <netdb.h> /* host to IP resolution */
#include <sys/time.h> /* for timeout values */
#include <unistd.h> /* for table size calculations */
#define PORT 5060 /* port of our echo server */
#define BUFLEN 1024 /* buffer length */
void main()
{
int i; /* index counter for loop operations */
int rc; /* system calls return value storage */
int s; /* socket descriptor */
int cs; /* new connection's socket descriptor */
char buf[BUFLEN+1]; /* buffer for incoming data */
struct sockaddr_in sa; /* Internet address struct */
struct sockaddr_in csa; /* client's address struct */
int size_csa; /* size of client's address struct */
fd_set rfd; /* set of open sockets */
fd_set c_rfd; /* set of sockets waiting to be read */
int dsize; /* size of file descriptors table */
/* initiate machine's Internet address structure */
/* first clear out the struct, to avoid garbage */
memset(&sa, 0, sizeof(sa));
/* Using Internet address family */
sa.sin_family = AF_INET;
/* copy port number in network byte order */
sa.sin_port = htons(PORT);
/* we will accept connections coming through any IP */
/* address that belongs to our host, using the */
/* INADDR_ANY wild-card. */
sa.sin_addr.s_addr = INADDR_ANY;
/* allocate a free socket */
/* Internet address family, Stream socket */
s = socket(AF_INET, SOCK_STREAM, 0);
if (s < 0) {
perror("socket: allocation failed");
}
/* bind the socket to the newly formed address */
rc = bind(s, (struct sockaddr *)&sa, sizeof(sa));
/* check there was no error */
if (rc) {
perror("bind");
}
/* ask the system to listen for incoming connections */
/* to the address we just bound. specify that up to */
/* 5 pending connection requests will be queued by the */
/* system, if we are not directly awaiting them using */
/* the accept() system call, when they arrive. */
rc = listen(s, 5);
/* check there was no error */
if (rc) {
perror("listen");
}
/* remember size for later usage */
size_csa = sizeof(csa);
/* calculate size of file descriptors table */
dsize = getdtablesize();
/* close all file descriptors, except our communication socket */
/* this is done to avoid blocking on tty operations and such. */
for (i=0; i<dsize; i++)
if (i != s)
close(i);
/* we initially have only one socket open, */
/* to receive new incoming connections. */
FD_ZERO(&rfd);
FD_SET(s, &rfd);
/* enter an accept-write-close infinite loop */
while (1) {
/* the select() system call waits until any of */
/* the file descriptors specified in the read, */
/* write and exception sets given to it, is */
/* ready to give data, send data, or is in an */
/* exceptional state, in respect. the call will */
/* wait for a given time before returning. in */
/* this case, the value is NULL, so it will */
/* not timeout. dsize specifies the size of the */
/* file descriptor table. */
c_rfd = rfd;
rc = select(dsize, &c_rfd, NULL, NULL, NULL);
/* if the 's' socket is ready for reading, it */
/* means that a new connection request arrived. */
if (FD_ISSET(s, &c_rfd)) {
/* accept the incoming connection */
cs = accept(s, (struct sockaddr *)&csa, &size_csa);
/* check for errors. if any, ignore new connection */
if (cs < 0)
continue;
/* add the new socket to the set of open sockets */
FD_SET(cs, &rfd);
/* and loop again */
continue;
}
/* check which sockets are ready for reading, */
/* and handle them with care. */
for (i=0; i<dsize; i++) {
if (i != s && FD_ISSET(i, &c_rfd)) {
/* read from the socket */
rc = read(i, buf, BUFLEN);
/* if client closed the connection... */
if (rc == 0) {
/* close the socket */
close(i);
FD_CLR(i, &rfd);
}
/* if there was data to read */
else {
/* echo it back to the client */
/* NOTE: we SHOULD have checked that */
/* indeed all data was written... */
write(i, buf, rc);
}
}
}
}
}
And remember: clients, and especially servers, are expected
to be robust creatures. Yet, the network is a too shaky ground to assume everything
will work smoothly. Expect the unexpected. Check the return value
of any system call you use, and act upon it. If a system call failed, try
to figure out why it failed (using the returned error code and possibly the
errno variable), and if you cannot write code to
bypass that kind of failure - at least give your users an error message
they can understand.